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Introduction 


To  deal  vd.th  the  problem  of  many  problem-oriented  languages  to  be 
translated  to  many  machine  languages,  three  main  lines  of  attack  have  been 
suggested. 

(1)  That  the  multiplicity  of  problem-oriented  languages  be  reduced  by 
the  adoption  of  a  universal-algprithmic  language,  e.g., ALGOL.  This 
legislative  manner  of  abolishing  the  difficulty  does  not  seem  to  be 
a  complete  solution:  such  languages  as  have  been  proposed  lack 
universality  in  varying  ways.  For  example  ALGOL  has  no  provision 
for  the  processing  of  strings  of  symbols.  Iii  addition,  it  is  not 
at  all  clear  that  present  ideas  of  what  constitutes  a  universal 
language  will  be  valid  in  a  future  with  time-sharing  and  even  per¬ 
haps  self-organising  computers. 

(2)  That  a  common  machine-oriented  language  be  devised.  This  language 
(UNGOL  for  short)  is  thought  of  as  an  intermediary  language  through 
which  translation  will  be  made.  Each  problem-oriented  language  is 
to  be  translated  to  UNGOL  by  a  translator  that  can  be  written  in 
UNGOL,  An  UNGOL  to  machine-language  translation  completes  the 
process, 

(3)  That  translators  be  so  cons  time  ted  that  they  accept  the  description 
of  a  source  language  and  are  thereby  converted  into  translators  for 
that  language.  For  each  machine,  only  one  such  translator  need  be 
built. 

This  report  follows  the  third  approach. 
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In  order  to  give  a  degree  of  universality  to  a  compiler,  two  things  must 
be  done.  First,  there  must  be  some  method  of  describing  the  source  language; 
and,  second,  there  must  be  some  way  of  dcsiri^ifig  the  properties  of  the 
machine  for  which  translation  is  made.  In  great  measure,  the  first  problem 
was  solved  bv  the  introduction  by  Backus  of  a  notation  for  describing  the 

y 

syntax  of  ALGOL  .  This  notation  is  related  to  similar  notations  in  linguis- 
tics  (phrase -structure  grammar  in  substitution  form  )  and  in  logic 

"3  / 

(productions  ).  The  second  problem  is  one  of  considerable  difficulty. 
Although  it  is  possible  to  describe  the  properties  of  a  computing  machine, 
as  is  done  in  any  reference  manufil,  such  descriptions  are  not  in  a  form  whicli 
is  simple  to  manipulate  mechaniccilly.  This  report  proposes  an  alternative.  -- 
that  the  description  of  the  source  language  should  not  be  made  independently 
of  the  target  language  but  should  exploit  any  properties  of  the  target  language 
that  are  useful.  For  example,  if  the  machine  has  the  ability  in  one  instnrc- 
tion  to  add  the  absolute  value  of  a  number,  the  source  lan.guage  should  be 
described  with  that  operation  as  one  of  its  primitives,  rather  than  the  two 
primitives  of  addition  and  taking  the  absolute  value. 

This  report  is  divided  into  four  sections.  The  first  section  proposes 
a  mechanism  for  scanning  a  linear  text,  and  performing  a  sjmtactic  analysis. 

A  pseudo-machdne ,  the  Syntax  Machine  is  described,  whose  programs  may  be  con¬ 
sidered  to  define  the  language  of  the  text.  The  output  from  the  Syntax  Machine 
is  a  string  whose  eviuLuation  leads  to  a  (partial)  translation  of  the  source 
text. 

YJ  J.W.  Backus  et.  al.’ ’Report  on  the  Algorithmic  Language  ALGOL  60”; 
Communications  ACM  2  P*299,  May  I96O. 

^  N.  Chomsky  ’’Three  Models  for  the  Description  of  Language.’’ 

Tr.  IRE,  IT-2;  No.  3.  p.ll3;  Sept,  1956. 

2/  M.  Davis  Computability  and  Unsolvability.  Ch.6;  McGraw-Hill, 

New  York;  1958. 
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The  second  part  of  the  report  discusses,  mainly  by  examples,  the 
application  of  the  Syntax  Machirie  to  translation  for  a  particular  target- 
machine  language,  and  shows  how  the  syntax  description  may  be  written  to 
exploit  the  special  features  of  the  target  machine. 

The  third  sections  considers  the  role  of  Declarations  in  the  source 
language  and  the  mechanisms  required  to  effect  them  in  the  translation  process. 
The  fourth  section  deals  with  a  supplementary  process  of  assembly  which  is 
required  to  evaluate  the  strings  produced  by  the  Syntax  Machine. 

1.2.  A  Notation  for  Syntax 

The  notation  to  be  presented  is  similar  to  that  of  Backus,  but  with  an 
important  difference.  Whereas  the  notation  of  Backus  enables  texts  conform¬ 
able  with  the  rules  of  syntax  to  be  derived  by  substitutions,  the  present 
notation  is  used  to  express  a  decision  procedure  that  tests  whether  an  example 
of  text  conforms  to  the  rules. 

The  decision  procedure  tests  the  legality  of  a  string  by  applying  one 
of  three  types  of  tests  to  the  string.  Let  us  denote  syntactic  variables 
by  enclosing  the  name  of  the  variable  within  the  brackets  <  >  ,  and  denote 
syntactic  constants  (  i.e,,  characters  of  the  alphabet)  by  themselves.  The 
three  types  of  test  and  their  notation  are: 

(l)  Is  the  string  a  value  of  a  syntactic  variable  which  is  the  concatenate 
of  other  syntactic  variables  or  constants? 

This  is  expressed  by  the  fomnila 

<A>  <  B  >  <  C  ><  D  >  ...  <  X  > 

where  juxtaposition  in  the  formula  signifies  concatenation  in  the 
string  tested,  and  the  sign  ;;  =  means  that  the  variable  on  the  left 
is  defined  by  the  expression  on  the  right. 
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(2)  Is  the  string  a  value  of  a  syntactic  variable  defined  as  being  an 
alternative  of  several  variables? 


<A>::=<B>j<C>|<D>j  ...  j<X> 

where  the  connective j denotes  that  the  varj.ables  are  alternatives, 
in  the  sense  that  the  string  is  a  value  of  <  A  >  if  it  is  a  value 
of  <  B  >,  or  of  <  C  >  and  so  on. 

(3)  Is  the  string  a  concatenate  of  several  strings  with  the  last  string 
repeated  an  indefinite  number  of  times  (pertiaps  none)? 

This  is  expressed  by  the  formula 


<A>s:*=  <B>  <C>  <D>  ...  |^<X>j 

where  ^  denotes  iterated  concatenation,  and  the  definiens  has  at 

least  one  term  before  the  iterated  concatenation. 

In  the  foregoing  it  has  been  tacitly  assumed  that  tests  implied  by  the  right- 
hand  sides  of  these  expressions  had  been  taken  in  the  order  of  writing,.  If 
this  is  now  adopted  as  a  convention  of  the  formalism,  then  the  formulae 
express  algorithms  for  testing  whether  strings  are  values  of  syntactic  varia¬ 
bles  . 


The  formulae  now  have  the  corresponding  interpretations. 

(1)  The  string  is  an  <  A  >  if  a  head  string  is  found  to  be  a  <  B  and 

the  head  of  the  remaining  part  of  the  string  is  a  <  C  >,  and  so  on. 

(2)  The  string  is  an  <  A  >  if  it  is  a  <  B  >,  or  if  not  that,  then  a  <  C  >, 

and  so  on. 

(3)  The  interpretation  is  similar  to  that  of  the  first  type,  but  with 
the  last  component  iterated. 
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1.3  Syntax  Notation  as  Program.  The  Syntax  Machine 

In  this  section  a  pseudo -machine,  called  the  Syntax  Machine,  will  be 
defined  that  uses  the  definitions  of  the  previous  sections  as  programs  to 
decide  whether  strings  are  values  of  syntactic  variables. 

Consider  a  machine  with  an  inpuc  tape  on  each  consecutive  position  of 
vrtiich  is  inscribed  one  character  of  a  string  to  be  analyzed.  The  machine 
obeys  program  steps  of  the  form  F,AT,AF  where  F  specifies  the  action  to 
be  taken,  and  AT,AF  specify  the  addresses  of  the  next  program  steps.  For 
each  character  of  the  alphabet  and  for  some  important  subclasses  there  is  a 
machine  instruction  of  a  type  cadLled  a  ’’Comparator.’’  A  Comparator  instruc¬ 
tion,  say  for  the  character  ’’X’’,  will  read  the  character  presently  under 
the  reading  head  on  the  input  tape.  If  the  character  is  ’’X”,  then  the 
input  tape  is  moved  bj'  one  character  position  and  the  next  instruction  of 
the  program  taiken  from  address  AT.  If  the  character  is  not  ”x”,  then  the 
tape  is  not  moved  and  the  next  instruction  is  taken  from  address  AF.  Where 
the  Comparator  is  for  a  subset  of  the  characters  the  action  is  similari  if 
the  character  under  the  reading  head  belongs  to  the^  subset,  the  tape  is  moved 
and  the  next,  instruction  is  taken  from  location  AT, 

In  programming  for  this  machine,  another  type  of  program  step  may  be  used, 
the  Recognizer:  it  is  a  subroutine  composed  out  of  Comparators  and  Recogni¬ 
zers.  To  call  a  subroutine  a  special  function  of  the  machine,  denoted  here  by 
S*,  AT,  AF,  is  used.  Its  action  is  to  copy  the  present  position  of  the  input 
tape  on  to  the  current  level  of  the  control  push-down  list,  together  with  the 
addresses  AT,  AF,  in  parallel  lists.  Then  the  level  of  control  is  increased 
by  1  and  the  next  program  step  taken  from  location  S,  Two  more  special 
instructions  proAride  for  exits  from  subroutines,  in  case  of  failure  or  success 
of  the  decision  process.  These  functions,  called  ’’False”  and  ’’True,” 
decrease  the  level  of  control  by  1  and  cause  the  next  program  step  to  be  taken 
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from  the  AF  or  AT  addresses  in  the  control  push-down  list.  In  the  case  of 
’’False’’  the  input  tape  is  repositioned  to  be  as  it  was  vrtien  the  subroutine 
was  entered. 

%  this  means  Recognisers  can  be  constructed  that  act  like  Comparators, 
but  recognize  strings  of  characters. 

With  this  apparatus  it  is  possible  to  write  programs  for  the  syntax 
definitions  of  the  previous  section. 


Examples 


1. 

<  A  >  ; ;= 

a  1 

b  c 

which  recognizes  the  occurrence 

of 

the  character  a  or  b  or  c. 

2. 

< 

B  >  ;;  = 

<  A  > 

<  A  > 

if 

A  is  as  defined  in  Ex.l,  this 

recognizes  the  pairs  of  characters 

aa 

,ab,ac,ba,bb,bc,ca,cb,cc . 

3. 

< 

C>  ;;  = 

X 

recognizes  x,xy,xyy,xyyy  etc. 

k. 

< 

I  >  ;;  = 

<'  L  >  • 

[<  NL  >} 

recognizes  ALGOL  identifiers,  if  L 

is 

a  recognizer  (or  comparator)  for 

alphabet  letters  and  NL  is  a  recognizer 
for  letters  and  numerals. 

Programs 

for  these  examples  may  be 

wid-tten  in  the  ’  ’machine  ’  ’  instruction 

notation  as  follows; 

Label 

Function 

AT 

AF 

1) 

A 

C(a) 

S4 

SI 

SI 

C(b) 

S4 

S2 

S2 

C(c) 

S4 

S3 

S3 

False 

S4 

True 

2) 

B 

A* 

S5 

S3 

S5 

A* 

S4 

S3 

3) 

C 

C(x) 

S6 

S3 

S6 

C(y) 

S6 

S4 

4) 

I 

L* 

S7 

S3 

S7 

NL* 

S7 

S4 

Here 

C(x)  denotes  the  Comparator  for  x. 

and  similarly. 
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In  this  notation,  we  can  write  programs  for  which  there  is  no  representa¬ 
tion  in  the  algebraic  formalism;  this  will  be  convenient  on  occasion.  We 
could  define  syntax  in  terms  of  programs  for  the  syntax  machdre  ;  this,  like¬ 
wise,  may  enable  us  to  write  some  forms  of  syntax  not  representable  by  the 
algebraic  formalism,  or  if  so,  only  by  \ineconomical  programs. 

An  additional  feature  of  great  power  will  be  to  allow  subroutines  to 
store  bits  in  a  list  working  in  parallel  with  the  push-down  list,  so  that  a 
syntactical  property  recognized  in  a  subroutine  may  be  tested  and  causa 
branching  in  the  routine  controlling  it.  In  a  binary  computer  it  will  be 
easy  to  store  many  bits  in  the  same  machine  word  (usually  30  at  least  in  most 
binary  computers). 

Two  functions  are  required;  * 

(a)  M(X)  .  Copy  a  bit  into  bit  position  X  in  the  k-1  th  level  of  the  push¬ 
down  list:  k  is  the  current  level  of  the  routine  in  which  M(X)  acts. 

I  is  specified  using  the  data  field  of  the  instruction;  the  next  instruc¬ 
tion  is  taken  from  the  address  specified  in  the  AT  field. 

(b)  K(X)  .  If  the  pseudo-machine  is  currently  operating  on  level  k,  examine 
the  X  bit  on  level  k.  If  it  is  1,  proceed  to  the  address  specified  by 
the  AT  address;  if  it  is  0  proceed  to  the  address  specified  by  AF. 

When  a  subroutine  is  entered  in  level  k,  from  level  k-1  the  set  of  bits 
(or  marks  as  they  will  sometimes  be  called)  should  be  set  to  0. 


*  There  are  many  ways  of  doing  this.  It  would  be  more  economical 
in  machine  time  and  storage  to  allow  the  M  and  E  functions 
to  set  and  test  many  bits.  For  the  simplicity  of  e^qsosition, 
we  adopt  the  simplest  M  and  K  functions. 


1  .k  Flow  Diagrams  for  the  Syntax  Machine 


The  simplicity  of  the  operations  of  the  synta>r.  machine,  maikes  it 
possible  to  write  flow  diagrams  precisely,  by  the  use  of  the  following 
conventions . 

(1)  Unless  othervd.se  indicated  by  arrows,  the  flow  of  control  is 
across  the  page  from  left  to  right,  or  downwards. 

(2)  Unless  othervd.se  indicated,  true  exits  from  comparators  are 
written  horizontally,  and  false  exits  vertically. 

(3)  Comparators  are  indicated  by  circles  containing  the  character 
to  be  compared;  Recognizers  are  indicated  by  the  name  of  the 
recognizer,  enclosed  in  angular  brackets. 

(4)  To  indicate  the  M  fvmction  that  places  a  mark  in  the  push-down 
list,  vn;ite  M(X)  in  the  diagram,  where  X  is  the  mark.  For  mark 
comparators,  use  K(X),  with  exit  conventions  as  vd.th  comparators 

(5)  To  minimize  linos  of  control,  nodes  of  the  flow  diagram  may  be 
labeled.  Recognizer  exits  vdll  be  labeled  ’’True’’  or  ’’False.' 

Example:  A  ;:=<B>  f  al  may  be  diagramed  as 
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1 „ 5  Recursive  Programs  for  the  Syntax  Machine 

In  this  section  we  investigate  certain  properties  of  the  machinaj  in 
particular,  we  ask  lor  rules  for  constructing  programs  that  will  always 
provide  a  decision.  An  example  shows  that  it  is  possible  for  programs  to 
cycle  indefinitely,  (e.g.,  single  instruction  whose  AF  address  is  the  address 
of  the  instruction  itself).  However,  there  is  one  main  source  of  danger  in 
programs  using  notation  of  the  three  standard  types,  that  of  the  careless 
use  of  recursion.  The  manner  of  constructing  subroutines  allows  recursive 
definitions  to  be  used. 

Consider  the  program,  <A>::=  <A>  <B>.  In  order  to  test 
whether  the  text  is  a  value  of  <  A  >,  the  question  is  asked,  ’’is  the  string 
of  characters  starting  at  this  point  an  example  of  <  A  >?’’ 

This  question  is  answered  if  two  subsidiary  questions  are  answered  in  the 
affirmative.  The  first  question  is  exactly  the  same  as  the  original  and  is 
asked  at  exactly  the  same  position  of  the  input  tape.  However,  in  the  program 
<  A>  ii=  <B>  <A>  this  circularity  does  not  arise,  because  the  question, 
’’is  the  string  an  example  of  <  A  >  ?’’  is  never  asked  twice  at  the  same  posi¬ 
tion  cf  the  input  string.  The  tape  will  have  moved  because  of  the  application 
of  the  program  step  <  B>,  which  must  have  a  successful  outcome  (and  hence  the 
input  tape  moves)  before  <  A  >  is  applied  again.  The  first  example  is  of  a 
program  with  an  ’  ’infinite  loop;  ’  ’  t.he  second  is  a  finite  program,  if  applied 
to  a  text  of  finite  length  (and  in  practice,  all  texts  are  finite). 

Circularity  in  programs  is  not  always  so  easy  to  discern  as  in  the  above 
example.  There  is,  however,  a  simple  rule  whose  successive  application  checks 
absence  of  circularity.  A  program  step  is  non-circular  if  all  program  steps 
in  its  immediate  definition  are  non-circular  when  it  is  defined  by  a  formula 
of  type  2  (i.e.,  as  a  set  of  alternated)  or  (for  formulae  of  types  1  and  3)  if 
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the  first  step  is  non-circular.  Any  program  step  that  is  a  Comparator  is  non- 
circular.  For  example  in  the  formulae  of  types  1  and  3 
<A.>:;=<B>  <C>  <D>  ...  or 

<A>::=<B>  <C>  <D>  ...  |<X>j 

<  A  >  is  non-circular  if  <  B  >  is  non-circular.  In  the  formula  of  type  2 

<A>  :,.<3>  |<0>  |<D>  ... 

<  A  >  is  non-circular  only  if  <  B  >,  <  C  >,  <  D  >  ...  are  all  non-circular.  * 

All  steps  in  a  program  must  finally  be  non-circular.  The  proof  of  this  rule 
follows  from  the  observation  that  a  non-circular  program  step  either  exists 

via  the  ‘’Fail’’  exit,  or  it  moves  the  tape  forward. 

Recursive  definition  is  permissible  subject  to  this  rule. 

1.6  The  Algorithmic  Form  of  the  Syntax  Formalism 

In  this  section  we  explore  the  difference  between  the  use  of  the 
syntax  notation  to  express  rules  of  derivation  and  miles  of  string  analysis. 

The  discussion  of  the  previous  section  shows  that  some  forms  of  recursive 
definition  are  invalid  as  rules  of  analysis|  these  forms  may  be  expanded  and 
rearranged  into  the  form 

<A>;j=<A>  <B>|<C> 

which  expresses  all  the  possible  formulae  rendered  invalid  as  rules  of  analysis. 
The  strings  generated  by  this  mile  of  derivation  are  of  the  form  CB  ...  B 
i.e.,  those  strings  which  have  n  >  0  strings  of  type  B  concatenated  at  the 
right  of  a  C.  The  algorithmic  fon^  of  the  definition  is<A>  ;s=<C>  |^<B> 
This  shows  how  the  invalid  recursion  may  be  avoided. 


*  For  subprograms  written  in  machine  language,  the  mile  is  that 
the  program  steps  that  read  the  heads  of  strings  must  be  non¬ 
circular. 
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Another  difference  is  in  the  interpretation  of  the  type  2  formula;  in 
the  algorithmic  form  the  order  of  the  terms  is  important  since  it  is  the  order 
in  which  tests  are  made.  For  example,  the  substitution  rule 

<  A>  ;;=  b  |  ba  generates  the  two  examples  ”b’’  and  ’ ’ba.  ’  ’ 

However,  if  this  were  taken  as  an  algorithm  and  applied  to  the  string  ’‘ba’' 
it  would  test  merely  the  first  character  ”b, ’  ’  and  finding  this  to  be  a  pos¬ 
sible  value  v.^ould  accept  it,  leaving  the  character  ’’a”  unscanned. 
Consequently  the  correct  algorithmic  form  would  be 


<  A  > 


ba 


b 


The  ordering  relation  among  the  alternatives  in  the  definiens  of  a  t3q)e  2 
formula  may  be  expressed  by  the  rule  that  if  one  recognizer  A  defines 
strings  that  are  heads  of  any  stri.ngs  defined  by  a  recognizer  B  ,  then  B 
must  precede  A  in  the  formula.  If  no  ordering  is  imposed  by  this  rule, then 
it  can  be  made  to  minimize  cost  by  testing  those  strings  that  are  frequent 
before  those  that  are  rare. 


Remark 

The  difference  between  the  two  formalisms  is  that  in  the  case  of  the 
algorithmic  form  a  direction  of  scan  is  an  essential  part  of  the  interpreta¬ 
tion,  whereas  in  the  substitution  form  no  notion  of  scanning  is  present.  It 
is  suggested  that  source -language  syntax  be  expressed  in  algorithmic  form  to 
avoid  ambiguity;  this  form  may  always  be  interpreted  in  substitution  form 
(but  not  vice  versa,  as  we  have  seen).  Two  forms  of  algorithmic  syntax  are 
possible,  according  to  the  direction  of  scan;  in  this  note  the  natural  order 
of  scanning,  as  in  reading,  is  assumed. 
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1.7  The  Syntax  Machine  vrith  Output 


In  previous  sections  we  have  discussed  how  to  recognize  texts  that  conform 
to  the  rules  of  a  syntax;  the  result  produced  by  the  machine  has  been  only  an 
indication  of  validity. 

An  output  can  be  generated  as  follows: 

(1)  Each  comparator  instruction  (reading  a  character  of  the  input  string) 
can  be  modified  to  write  the  character  on  an  output  tape  if  it  is 
recognized  by  the  comparator.  Such  comparators  that  produce  output 
will  be  written  with  underlining.  Thus  in  the  recognizer  <  A  >  ;:= 
a  b  Cg  ’*a”  and  ’’b*’  will  be  written  on  the  output  tape  but  not 
”  c  ’ ’  whenever  one  is  recognized  by  the  recognizer  <  A  >. 

(2)  Whenever  a  "True”  return  is  made  from  a  recognizer  there  will  be 

the  option  of  writing  a  pattern  of  the  form  (  p  :  q  ;  r  )  on  the  cur¬ 
rent  position  of  the  output  tape.  This  pattern  may  be  written  in  the 
data  portion  of  the  "True"  return  instruction.  The  elements  of  this 
pattern  will  have  the  interpretations; 

2a.  p  specifies  an  instruction  or  a  macro-instruction  for  subsequent  assembly. 

2b.  q  is  a  type  number,  specifying  the  maruier  in  which  the  pattern 

(  p  ;  q  ;  r  )  will  be  treated  by  an  assembler  whose  input  is  the  present 
output  tape. 

2c =  r  is  the  number  of  characters  or  character  groups  written  on  the  output 
tape  by  the  recognizer. 

(3)  If  a  recognizer  is  named  by  a  pattern  (  p  ;  q  ;  r  )  the  whole  output 
generated  by  this  recognizer  v/ill  contribute  1  to  the  character  count 
of  any  recognizer  using  it.  If  a  recognizer  is  not  so  named,  each 
ujiit  of  output  generated  by  it  will  contribute  to  the  character  count 
of  any  recognizer  using  it. 
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{k)  The  action  of  naming  will  be  signified  in  the  algorithmic  notation 
by  adding  the  naming  pattern  in  quotation  marks  at  the  end  of  the 
correspordiiig  formula.  Eicample; 

<  C>  :;=  X  £ j  ”(  P  :  Q  :  0  )” 
will  recognize  x,  xyj  xyy,  etc.  on  the  input  tape  and  generate  the 
corresponding  patterns  on  the  output  tape,  viz, 

(P:QtO) 

y,(PsQ:l) 

y,y,(PfQ:2) 

yiy.y. (P:Q:3)  and  so  on. 

Note  that  the  naming  pattern  has  r=0  in  the  program. 

In  flow  diagrams  a  true  return  with  naming  will  be  indicated  by  the 
taming  pattern,  ’’(PsQ:0)”. 

(5)  '.Vhen  a  ’’False’’  retxim  is  made  from  a  recognizer,  the  output  tape  is 
repositioned  to  the  position  it  had  when  the  subroutine  was  entered. 

1.8  The  S^Titax  of  the  Output 

The  language  of  the  output  is  particularly  simple.  Its  alphabet  is  formed 
from  the  characters  of  the  original  alphabet  together  with  the  symbols,  (PsQsR) 
which  are  written  by  naming.  These  latter  are  ’’syntactic  operators’’  whose 
operands  are  either  characters  of  the  original  alphabet  or  are  expressions 
formed  by  syntactic  operators. 

We  define  recursively  the  class  of  output  strings  as  follows; 

1.  All  characters  from  the  original  alphabet  are  values  of  syntactic  variables. 

2,  Let  denote  values  of  syntactic  variables  and  (0|r)  denote  syntactic 

operators  of  order  r,  P  >  0.  Then  the  expression,  V2,  . . .  ,  f 

is  also  a  value  of  a  syntactic  variable.  Examples  are 

(0“O) 

v,(0a) 

V,V,(?f:l),(0:2) 
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3.  The  output  string  generated  by  a  named  recognizer  is  a  value  of  a  syntac¬ 
tic  variable. 

In  the  processing  of  the  output  string,  the  values  of  syntactic  variables 
vd.ll  be  used  to  construct  segnents  of  the  target  language  according  to  nature 
of  the  syntactic  operators.  The  three  parts  of  the  syntactic  operator  have 
separate  purposes.  P  wi  11  be  data,  Q  vdll  tell  hov/  the  data  p  and  the  data 
from  the  R  operands  vd_Ll  be  combined.  Thus  the  output  string  may  be  viewed  as 
date,  vd.th  the  processing  rules  combined  v^lth  it. 

The  output  string  is  an  exanple  of  postfix  notation,  similar  to  the  prefix 
notation  of  the  logicians,  but  in  reverse  order.  There  is  a  particularly 
simple  algorithm  to  eva].uate  axpressions  in  postfix  notation.  Let  there  be 
a  list,  the  push-dovm  list  L,  each  position  of  vdiich  is  capable  of  holding 
(directly  or  by  indirect  reference)  the  value  of  a  syntactic  variable.  Then 
as  the  output  string  is  scamied  syntactic  variables  are  placed  in  successive 
positions  of  L  until  a  syntactic  operator  appears.  If  the  syiitactic  operator 
is  of  order  r,  then  its  operands  are  to  be  found  in  the  current  last  r  posi¬ 
tions  of  L,  The  result  of  the  evaluation  of  the  expression  specified  by  the 
operator  is  then  placed  in  the  first  of  these  positions,  say  position  m,  and 
che  process  continued,  vdth  the  next  syntactic  variable  being  read  into  posi¬ 
tion  m+1  -  or  if  an  operator  is  next  read,  its  operands  will  be  in  the  positions 
M-q+l  througn  m  (  q  is  the  order  of  the  operator). 

For  example,  if  the  string  to  be  processed  is  V2,  (0^sl),  {02°^) 

the  successive  configurations  of  the  list  L  vuill  be 


(1) 

In 

=  . 

(2) 

h. 

II 

=  V2 . 

(3) 

h 

=  Vi  , 

=  (4  (Vo) 

.  by  application  of  0^ 

(4) 

by  application  of 

lo9  The  Algebra  of  the  Algorithmic  Syntax 


Let  Xj^  (  i=  1,  2  ...  )  stand  in  place  of  the  forms  a  ,  <  A  >, 

^  <  A  > J  ,  ’’(  a;b;0  i.e.,  in  place  of  recogiizer  (or  comparator) 

symbols,  iterated  recognizer  symbols  and  naming  symbols.  Then  the  standard 
formulae  become 


=  X^  X^  X^  ...  X^  from  type  (l)  and  (3)  formulae 

Xg^  =  X^  j  X^  I  X^  I  ...  j  Xy  from  the  typ<}  (2)  formuD.ae. 

The  operations  of  this  algebra  are  concatenation  and  |  .  It  is  easily 

verified  that  there  are  no  commutative  laws,  but  associative  and  distribu¬ 


tive  laws  hold,  t-hjos 


X^  |X2  X2 

(Xi  1X2) 


h  =^1 


(X^  X^) 


(X2I  X3) 

(X^X  )  I  (X2X  )  .  Xi  (X2  I  X3)  =  I  (Xj^X3) 


The  distributive  laws  are  important  as  they  provide  for  possible  seonoinization. 

One  particular  form  of  par enthesi zed-syntax  notation  is  of  importance 
because  (in  this  case  only )  the  parentheses  do  not  imply  an  internal  subroutine 
for  the  bracket.  This  might  be  called  normal-concatenated  form,  of  which  an 
example  is 

XiX2(X3  I  I  X5)  x^x^^de  I  X9)  x-^o 

The  flow  diagram  for  this  is 
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The  general  form  of  the  normal  concatenated  form  is 

o . .  A^  where  the  A  are  either  single  symbols  or  are  of  the  form 
(  I  ®2  I  ^  where  the  B  are  also  single  symbols. 

The  other  normal  form,  example  W=X2^  |  ...  requires  all 

alternates  which  are  concatenates  (except  for  a  concatenate  in  the  last 
position)  to  be  constructed  as  subroutines.  The  above  form  must  be  program¬ 
med  as  Z  -•  '^2^  »  W  =  X]^  j  Z  X^^  . . .  An  exception  is  made  for  concate¬ 
nated  pairs,  where  the  second  member  stands  for  a  naming  operation. 

All  these  rules  follow  from  the  interpretation  of  the  notation.  For 
example,  consider  (X2^X2)  X^  •  where  I2  is  not  a  naming  operation.  This 
program  tests  a  string  using  Xj^  ;  if  this  succeeds,  the  is  applied  to 

the  next  part  of  the  input  string.  If  X2  fails,  the  string  must  be  reposi¬ 
tioned  so  that  the  alternate  test  X^  may  be  correctly  applied.  This  can  be 
done  only  by  making  (X]_X2)  a  subroutine  (whose  False  exit  will  do  the 
repositioning) . 

Identity  and  Infinity  symbols 

The  notation  may  be  enriched  by  the  addition  of  three  symbols  _A_ ,  "AA 
and  oO  ,  corresponding  to  comparators  which  have  respectively 
_y\_  ;  no  false  exit,  does  not  read  the  input. 

•  no  true  exit,  does  not  read  the  input. 
qO  ;  no  exits  at  all . 

The  first  two  of  these  symbols  are  the  identity  elements  for  concatenation 
and  alternation.  They  allow  certain  transformations  to  be  made  in  expressions 
of  the  notation,  according  to  the  rules  given  at  the  end  of  this  section.  For 
example,  . 

Z'=X2X2  1  X^  =  X2^X2  X2^-A.=  Xi(X2  -A-)  by  the  distribution  law. 

X2  ]V\-is  a  recognizer  with  its  false  exit  joined  to  its  true  exit. 


The  following  equations  hold 


-Ax 

SS 

X 

x_A 

= 

X 

^  X 

= 

X 

X  at 

= 

X 

ATX 

- 

-V 

A|  X 

= 

A 

M 

= 

A 

A 

= 

oo 

[XV] 

= 

X  . 

A 

fx  IVt] 

s 

(y=> 

(*] 

= 

(A 

[A) 

= 

oo 
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1.10  Eicainples 

These  examples  are  for  the  pseudo-machine  with  no  output.  They  are 

1/ 

comparable  with  the  descriptions  of  ALGOL  60 

(1)  Programs  in  problem-oriented  languages  are  usually  written  as  sequences 
of  statements:  there  may  be  several  types  of  statement. 

<PrograD£>  ;;=  <Statement>  ^<Statem3nt>  J 
<Statement>  :  j  =  <Statement  1>  j  <Statement  2>  j  Otatement  3> 

This  states  that  a  program  is  composed  of  a  sequence  of  statements, 
and  that  there  is  at  least  one  statement  in  the  sequence.  There  are  3 
types  of  statement.  Each  statement  type  wo'old,  of  course,  be  defined 
in  terms  of  simpler  syntactic  variables  -  and  in  the  limit,  in  terms  of 
the  alphabet.  The  application  of  <Prograiii>  to  a  string  will  determine 
whether  the  string  is  an  example  of  a  text  in  the  language, 

(2)  Consider  algebraic  prefix  notation  using  -i-,  *,  /  as  binary  operators 
and  -  as  an  unar;/  operator.  Then  <  E  >  is  the  recognizer  for  the 
notation,  where 

<E>  ;t=<A>  <B>j<C>  <D>  <V> 

<  A  >  :s=  -  <  E  > 

<B>  ;2=  -t-  <E><E> 

<C>  ::=  *  <E><E> 

<D>  ;s=  /  <E><E> 

<  V  >  is  a  recognizer  for  variables  and  constants. 

This  example  shows  the  use  of  recursive  definition,  and  it  is  easily 

\J  J.W.  Backus  et.al.,  ’’Report  on  the  Algorithmic  Languagei ’ ’ 

ALGOL  60,  Coimnunications .  ACM  2j  P»299,  May,  I960. 
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shovm  to  be  non-circular.  It  may  be  written  in  normal  concatenate  forni  as 
<E>  :;=  <A>]<V>  (  +  |  /  j*  )  <E>  <E> 

vhere  <A>  ;;=  -  <E> 

The  parentheses  are  used  in  this  example  as  characters  in  the  syntax 
language:  it  is  assumed  that  they  will  not  occur  iri  the  text  analyzed. 
Note  that  in  this  example  the  ordering  of  the  alternates  is  not 
important. 

(3)  Normal  Algebraic.  Notation 

We  repeat  example  (2)  but  now  using  the  more  usual  infix  notation, 
with  the  operators  as  binary  connectives. 


<E>  ::=  <F>  3.1 

<  S  >  ::=<+-><  T  >  3.2 

<  F  >  ;  s=  <  T  >  I  <  S  >  3.3 

<+->::=  +  1  -  3.1 

1 

<T>  ;:=<A>  <¥>  3.5 

<  A  >  s  :=  <  V  >  <  */  >  <  T  >  3.6 

<  V>  :s=  *  I  /  3.7 


The  notation  may  be  extended  to  include  parenthetical  notation  in  the 
text  by  replacing  <  V  >  by  <  W>  in  3.5,  3.6,  and  aduing  two  more 
lines . 


<  w  > 

;;=  <  V  >  j  <  (E)  > 

3.8 

<  (E)  > 

;;=  (  <  E  >  ) 

3.9 
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Remarks 

3.1  says  that  an  algebraic  expression  is  composed  of  a  first  part 
<  F  >,  followed  by  an  indefinite  nxunber  of  subsequent  parts  <  S  >,  which  are 
additions  or  subtractions  of  terms  <  T  >.  3*3  says  that  the  first  part  is 

either  a  signed  term  <  S  >  or  an  unsigned  term  <  T  >.  5y  3.5  <  T  >  is  either 
a  product-quotient  form  <  A  >  or  merely  a  single  variable  <  ¥  >j  it  is  impor¬ 
tant  to  test  <  A  >  before  <  V  >,  since  <  V  >  occurs  as  the  first  element  in  <  A  >. 
Suppose  the  order  of  3.5  had  been  changed.  Then 


<  V  > 

j  <  A  > 

A 

> 

V 

<  V  > 

<  */  > 

<  T  > 

from  3.6. 

<  V  > 

<*/  > 

<  T  >1 

by  the  distribution  law. 

<  V  > 

1 

using  the  laws  of  the  algebra 

soznct/hing  Is  wrong  wit-h  "this,  ss  vfEs  to  bo  sxpoctod* 
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Part  2 .  Applications  of  the  Syntax  Machine 

In  the  first  part  of  this  report  the  syntax  machine  was  defined  and  is 
properties  discussed.  Now  we  go  on  to  discuss  its  application,  and  in  so 
doing  we  see  what  are  the  desirable  and  necessary  propert.ies  of  an  assembly 
program  which  can  process  the  output  from  the  syntax  machine.  The  whole 
translation  will  be  a  multi-stage  process  in  which  syntax  analysis  alternates 
with  assembly  operations.  The  assembly  operations  construct  new  strings 
which  may  then  undergo  syntactic  analysis.  Kow  many  times  this  has  to  be 
done  will  depend  on  the  source  language.  Whether  the  alternation  of  syntax 
analysis  and  assembly  is  made  over  segnents  of  the  text  or  over  the  whole  text 
depends  also  on  the  source  language  and  on  the  amount  of  storage  that  may  be 
available  for  intennediate  strings. 

For  example,  any  language  that  contains  declarations  will  require  several 
alternations  between  syntax  and  assembly  processes.  Consider  how  names  are 
used  for  different  types  of  niuabers,  e.g.,  fixed-point  and  floating-point 
numbers.  If  the  distinction  between  these  classes  of  numbers  is  made  by  a 
declaration,  rather  than  by  properties  of  the  names  themselves  (e.g.,  by  de¬ 
fining  integer  variable  names  to  be  those  that  begin  with  I,  J,  K  )  the  declara¬ 
tions  must  be  used  to  form  tables  of  the  names  of  each  class.  These  tables 
must  then  be  consulted  to  find  the  syntactic  properties  of  the  objects  named, 
whether  they  are  integer  variables,  or  are  functions  and  so  on. 

This  table  lookup  feature  is  not  a  property  of  the  syntax  machine  as 
describedi  it  is  proposed  that  this  should  be  part  of  the  assembly  processes. 
Syntax  analysis  is,  however,  usually  sufficient  to  separate  names  from  operator 
signs,  since  it  is  unusual  for  the  syntax  of  names  to  change  within  segments 
of  a  program.  Thus,  the  strategy  for  translation  would  be 

(a)  Use  the  syntax  analyser  to  discover  the  names  and  operator  signs  in 
segments  of  the  text. 
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(b)  In  the  output  there  win  be  values  of  syntactic  variables  corresponding 
to  names,  literal  constants  and  other  character  groups.  For  example, 
the  name  ABC  will  appear  on  the  output  from  the  syntax  analyzer  as 

A,  B,  C,  (0;3)f  where  0  will  specify  an  assembly  process,  that  mig^t 
replace  A,  B,  C,  (0:3)  by  a  ’'co-ordinate  name”l^,  meaning  that  ABC 
is  the  n’th  integer-variable  name.  Ij,  will  be  constructed  from  the 
position  of  ABC  in  the  table  of  integer  names,  and  will  be  stored  as 
a  single  character  that  will  be  recognized  syntactically  in  a  later 
use  of  the  syntax  analyzer  as  a  member  of  the  class  I. 

(c)  The  syntax  analyzer  can  then  be  applied  to  strings  which  now  consist  of 
operator  signs  from  the  original  text  and  co-ordinate  names  which  stand 
in  place  of  the  Original  names  and  literal  constants. 

These  semantic  considerations  shall  be  deferred  to  part  3  of  this  report. 
They  are  mentioned  here  so  that  it  will  be  possible  to  use  co-ordinate  names 
in  this  part  without  implying  that  these  co-ordinate  names  are  written  in  the 
original  text.  We  shall  also  be  able  to  treat  words  like  ”lf,  ”  ’’then,” 
’’do”  and  other  such  words  as  single  characters  of  the  string  analyzed.  This 
will  simplify  the  exposition.  We  shall  therefore,  in  this  part,  now  ignore  the 
interplay  between  syntax  analysis  and  assembly. 

2.1  Example  1.  Addition  and  Subtraction  of  Floating-Foint  Numbers 
(a)  Source  language  syntax 


<  E  >  = 

<  F  >  [<  S  >} 

1.1 

<  F  > 

<Vi>j<Ci>j<S> 

1.2 

* 

<  S  > 

(  +  1  -  )  <  F  > 

1.3 

For  example  ; 

Vi  +  Cl  -  V2  . 

1.4 

*  <  >,  <  C±  >  are  recognizers  for  floating-point  variables  and 

constants . 
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(b)  Syntax  Program  vd.th  annotations 


<  E  >  ::  = 

<  F  >  |^<  S  >j  "(0:v:0)” 

1.5 

<  F  >  ::= 

-  <  T  >  ”(CLS;a;0)'’  ]  (+ 

1-^)  <  ' 

<  S  >  ;  :  = 

<  31  >  j  <  32  > 

1.7 

<  SI  >  :;= 

+  <T>  ‘’(FAD:a:0)” 

l.B 

<  32  >  s :  = 

-<T>  ”(FSB;a:0)” 

1.9 

<  T  >  :;= 

<  Zi  >  1  <  Ci  > 

1.10 

’(CLA;a:0)’ 


1.6 


Explanation: 

The  source  language  S3mtax  defines  valid  strings  to  consist  of  a  first 
signed  or  unsigned  teim  <  F  >,  followed  by  an  indefinite  number  of  subsequent 
terms  <  3  >,  which  are  signed.  In  step  1.5»  the  naming  operation  ”(0:v;0)” 
represents  an  assembly  operation  that  will  put  together  the  separate  terms  to 
form  the  whole  expression.  These  terms  each  generate  an  instruction  in  the 
machine  language  by  naming  operations  such  as  ' ’ (CLAsasO) ’ ’  where  ”a’’  specifies 
an  assembly  operation  to  combine  the  data  portion  of  the  naming  operation,  e.g. , 
CLA,  with  the  name  or  symbolic  address  of  the  variable  or  constant. 

The  application  of  the  syntax  program  to  the  example  1./+  produces  an 
output  string 

,  (CLAsasl),  ,  (FADsasl)  ,  V2  ,  (FSBsa:l)  ,  (0sv;3)  1.11 

By  virtue  of  the  step  1,10  the  names  of  variables  and  constants  are  copied 
from  the  input  to  the  output  strings:  these  are  the  only  characters  so  copied. 
The  choice  of  machine  instruction  is  made  in  SI  and  32  from  the  signs  +  or  - 
on  the  input  string  but  these  signs  do  not  appear  in  the  output,  being  replaced 
by  the  corresponding  machine  instnictions  from  the  naming  operations. 

When  the  string  1.11  is  assembled,  two  processes  occur 
(1)  Combination  of  a  symbolic  address  with  a  machine  instruction. 

V  ,  (  0P;a;l  )  — >  OP  V  .  and 
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(2)  Combination  of  several  segments  of  code  (here  3  separate  machine 
instructions)  into  one  segment. 

^1»  •••  t  '^r»  (0;v;r)  — >  ^l‘^2  ^r 

Such  assembly  operations  convert  1.11  into  1,12, 

CLA  Vi 

FAD  Cl  1.12  * 

FSB  V2 

which  is  in  the  target  language. 

2.2  Example  2.  Extension  of  Example  1  to  Include  Storage  Operations 

Example  1  may  be  extended  to  include  simple  assignment  statements, 

so  that  statements  like  +  C]^  -  V2  may  be  translated. 

We  give  two  exaunples,  where  only  one  assignment  of  a  value  is  made,  and  where 
many  variables  may  be  assigned  the  same  value,  as  in  Vi  =  V2  =  . 

(a)  Single  assigriment. 

<H>  ::=  <G>  <E>  ”{0:b;0)”  2.1 

<G>  »  "(STOsasO)”  2.2 

Here  <  G>  represents  the  assignment  part  ’ ’V  .  The  sign  is  not 

transmitted  to  the  output  string,  being  replaced  by  the  naming  data.  The 
two  parts  of  the  assignment  statement  are  <  E  >  which  is  the  <  E  >  of 
example  1,  and  <  G  >.  The  nauning  operation  ”(C:;b:0)”  will  combine  these 
so  that  the  assignment  follows  the  calculation  :  it  should  always  have 
two  arguments  which  are  blocks  of  code  to  be  interchanged. 


*  The  meanings  of  the  machine  functions  are; 

CLA  ;  clear  the  accumulator  and  place  the  quantity  addredsed 
in  the  accumulator. 

CLS  ;  clear  and  subtract. 

FAD  ;  add  into  the  accumulator,  using  floating-point  arithmetic. 
FSB  :  subtract  from  the  accumulator,  floating  point. 
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(b)  Multiple  assignment 

<H>  ::=<!><£>  ”(0;b;0)”  2.3 

<I>  <G>  ^<G>  ”(0:v:0)”  2.k 

where  <  G  >  is  an  in  2.2  .  Step  2.4  says  that  there  may  be  one  or  more 
assignments,  which  are  grouped  by  an  assembly  operation  ”v’'  before 
being  interchanged,  according  to  2.3,  by  ’’b.’’ 

2.3  Example  3.  Arithmetic  expressions  using  •*•.-,*./  and  parentheses 

We  use  the  IS!  709  as  the  target  machine.  In  this  machine,  a.s  in  many 
others,  there  are  two  registers  concerned  with  multiplication  and  division. 

One  register,  the  AC,  is  concerned  with  addition  and  subtraction,  and  holds 
the  result  of  a  multiplication;  in  it  must  be  placed  the  numerator  before 
division.  The  other  register,  the  MQ,  holds  the  result  of  a  division;  in  it 
is  also  placed  one  of  the  factors  of  a  product  before  multiplication.  Conse¬ 
quently,  there  are  certain  forms  lor  which  it  is  unnecessary  to  use  intermediate 
storage;  for  floating-point  arithmetic  these  are 

(a)  +  X*Y/Z*  ...  ,  where  multiplication  and  division  alternate. 

(b)  +  X*Y/  ...  /U'#W  i  S  +  T  ...  ,  where  multiplication  and  division 

alternate  in  the  first  term,  the  last  operator  in  the  first  term 
is  *  and  then  follows  addition  or  subtraction. 

(c)  (+X/Y*  ...  -^Z  +  A...)/!!...  ,  where  a  parenthetic  expression 

will  provide  a  result  in  the  AC,  which  is  the  numerator  for  a 
division. 

For  problems  of  this  sort  we  must  use  the  machine  instruction  program¬ 
ming  for  the  syntax  machine.  We  shall  see  here  the  use  of  the  marking  and 
sensing  operations,  M(X)  and  K(X),  which  allow  notes  to  be  kept  of  where  inter¬ 
mediate  results  are  to  be  found  at  the  various  st^ages  in  the  object  progriun. 

In  devising  programs  of  this  sort,  it  is  fruitful  to  consider  the  states  of  the 
target  machine  as  it  would  obey  the  prograia  we  wish  to  generate.  There  will  be 
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st-ates  in  the  syntax,  program  corresponding  to  the  states  in  the  object  program 
being  generated;  these  states  in  the  syntax  program  will  be  the  states  at  the 
commencement  of  the  program  steps  (or  on  lines  in  the  flow  chart).  Sometimes, 
a  state  of  the  syntax  machine  will  also  be  represented  ty  marks  placed  by  M 
operations,  for  later  sensing  by  K  operations. 

There  are  four  principal  states  of  the  to.rget  machine  called  A+,  A-,  Q+ 
and  Q-,  when  the  AC,  is  holding  positively  (negatively  )  the  result  of  a 
partial  evaluation  of  the  expression.  Correspondingly  named  states  exists  in 
the  S3mtax  machine.  These  X'our  bit-symbols  are  used  by  M  and  K  operations, 
and  are  also  used  as  labels  in  the  flow  diagram.  An  example  of  the  use  of  this 
notion  of  states  in  the  object  machine  occurs  in  the  scanning  of  the  expression 
’’  .  This  is  analyzed  by  the  syntax  program  as  -  (X^Y-Z)  sxnc6  W9  c^n 

only  form  products  positively  in  the  AC,  and  may  be  able  to  absorb  the  negative 
sign  on  -  (X*Y-Z)  in  a  later  operation,  so  that  A+  (-X*Y+Z)  can  be  computed  as 
A-  (X*T-z),  for  example.  The  states  that  occur  during  the  computation  (and 
during  the  syntax  analysis  are 

text:  -X*  Y  +Z 

states:  Q-  A-  A- 

output  form:  X*  Y  -Z 

and  since  the  end  state  is  A-  ,  the  object  program  will  produce  the  negative 
of  the  -X-f^Y+Z  ,  There  >n.ll  be  a  mark,  A-,  in  the  marker  part  of  the  push-down 
list,  so  that  it  can  be  subsequently  recognized  that  a  program  to  evaluate  the 
negative  las  been  constructed.  In  general,  the  process  brings  negation  from 
the  in  of  parentheses  to  the  outside;  at  the  worst,  therefore  it  will  only 
be  necessary  to  provide  a  change  of  sign  for  any  parenthesized  expression,  and 
then  only  for  the  complete  expression  and  not  for  any  of  its  parts.  Indeed  the 
only  occasion  when  a  negated  result  will  be  produced  may  be  discovered  by  the 
application  of  the  inlLes: 
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(1)  A  variable  or  constant  has  parity  +1  . 

(2)  A  parenthesized  expression  has  the  parity  of  its  first  term. 

(3)  If  the  first  term  of  an  expression  is  of  produet-quotient  form, 
its  parity  is  given  by  rules  (4),  (5)  and  (6).  Otherwise,  the 
parity  is  +1  . 

(4)  If  the  first  of  the  multiplication  or  division  operators  is  ’ ’ , 
then  the  parity  of  the  term  is  the  evaluation  of  the  term 
(including  leading  +  or  -  signs)  using  the  parities  of  the 
components  as  values. 

(5)  If  division  comes  first,  and  the  first  numerator  is  a  variable 
or  constant,  then  the  parity  is  the  evaluation  of  the  term  using 
parities,  talcing  that  part  to  the  right  of  the  first  ''/'’sign  only. 

(6)  Otherwise,  proceed  as  in  rule  (4),  but  with  *'/'*  instead  of  * ’  ■»  ”  ^ 
If  the  parity  of  the  expression  is  -1,  its  negative  will  be  produced. 

The  target-machine  instructions  used  are 
LDQ  load  the  KQ  register. 

FMP  multiply  the  number  in  the  MQ  by  the  number  in  the 
specified  address.  The  result  appears  in  the  AC. 

FDH  divide  the  AC  by  the  number  from  storage:  the  quotient 
appears  in  the  liQ. 

XCA  interchange  the  contents  of  the  AC  and  MQ. 

FAD  add  to  the  AC. 

FSB  subtract  from  the  AC. 

CLA  clear  the  AC  and  add.  STO  store  the  AC. 

CLS  clear  the  AC  and  subtract.  STQ  store  the  MQ. 

In  the  course  of  evaluation  it  is  sometimes  necessary  to  store  intermed¬ 
iate  results:  for  this  purpose  the  assembly  process  following  syntactic 
analysis  must  be  able  to  generate  the  address  of  a  working  location.  The 
syntactic  operator  (D:c:0)  does  this,  where  D  will  be  the  machine  instruction 
required  to  store  the  result.  If  a  parenthetical  expression,  say  (A-B)  , 
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requires  the  result  to  be  stored,  the  corresponding  output  produced  by  the 
syntax  analyzer  vdJLl  be 

...  ,  A,  (CLA:atl),  B,  (FSB;a:l),  (ST0:c;0),  (0:v:3),  ...  . 

(ST0:c:0)  will  obtain  a  working  space  location,  say  W,  and  construct  the 
instruction  STO  W  ,  leaving,  it  in  the  push-down  list  of  the  assembler  so  that 
when  the  operator  (0:v:3)  is  processed  it  will  have  as  arguments  the  three 
assembled  single  instructions  (in  this  case)  CLA  A,  FSB  B,  STO  W  .  (0:v:3) 

assembles  this  into  a  block  of  code,  placing  the  name  of  the  result  W  in  the 
push-down  list.  Later  W  will  be  ccxnbined  with  a  machine  instruction  by  an 
operator  of  type  (D:a;l),  at  which  point  W  could  be  retvirned  to  the  list  of 
addresses  available  for  use  as  working  space. 

The  syntax  program  flow  diagrams  follow.  <  E  >  is  the  recognizer  for 
arithmetic  expressions;  successful  outcome  will  be  marked  in  the  S3nitax  machine 
push-down  list  by  A+,  A-,  Q+  and  Q-  according  as  the  result  in  the  object 
machine  would  be  in  the  AC  (positively  or  negatively)  or  the  MO  (positively  or 
negatively) . 
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<FDH±>  = 


<FAI)+> 


«CFAD-> 


<FSB+> 


<FSB-> 


X 

— <PAE> - (T) - 

K(-) 

m(q  +  )— j 

(z)- 

_ i _ 

- M(Q±  X 

<C!v»> 

<cv/-> 

<7/+> 

<V> 

<P> 


<STQ> 

<PAH> 

<P3> 


(FSB:asO)” 

(FAD;asO)” 

(FSB:a:0)” 

(FAD:a;0)” 

(FAD:a;0)” 

(FSB:a;0)” 

(FADjasO)” 

(FSB;a;0)’'’ 


;:=  <V>  *  '’(LDQ:a;0)’' 

:.•=  <V>  /  ”(GLSsa:0)” 

:s=  CV^  /  ”(CLA;a;0)’' 

::=  <^> 


<V->  :;=  <7>  ’'(CLSrasO)' 
<V+>  ;;=  <?>  ”(CLA:ajO)' 
<XCA>  :;=  ’’(XCAsasO)” 


<Ci> 

<E>- 


K(A+)- 


"V 

K(Q+)- 

K(ci-)- 


-<STa> 

-<sTa>- 

-<STQ>- 

-<ST(J>- 


’(ST0;c;0)” 
- <P> - 


<STQ> 


-Ji(-)- 


True 


’(STQ:c;0)'’ 


-K(+)- 


-Ji(-)- 


’  (0:v:0) 


-<E>- 


-K(A+>- 


-M(A->- 


K(A-)- 


-M(A+)- 


K(Q-)- 


-M(Q+)- 


True 


31 


2.4  Escample  4.  Assiaiment  Statements  using  the  Expressions  of  Ex. 3 

We  treat  assignment  statements  like  A=B=  ...  C=  E  where  E  is  an  eaqjression 
of  the  type  <  E  >  of  the  previous  exan^le. 

The  source  syntax  is  <AS>  =  <  ASl  > 

<AS1  >  :;=  <E>  j  <AS> 

At  this  point  we  could  merely  treat  Ex.  4  in  the  same  manner  as  Ex.  2. 

A  feature  of  tMs  type  of  treatment  for  this  case  would  be  that  we  have  to 
decide  which  type  of  storage  instruction  to  use  according  to  the  mode  A+,  A—  or 
Q+,  Q-  of  the  right  hand  side.  In  Ex.  2,  it  was  possihls  to  know  what  type  of 
storage  instruction  was  required  as  soon  as  the  ”  =  ”wa3  scanned.  In  Ex.  4  , 
this  is  not  so.  It  could  be  assumed  that  the  mode  was  A+,  say,  and  scan  the 
right-har«d  side.  If  the  assumption  were  correct,  the  assignment  statement 
could  be  constructed.  If  not  another  assumption  could  be  tried,  and  the 
assignment  statement  re-scanned.  This  might  have  to  be  repeated  before  a 
correct  assijmption  is  made. 

In  example  3  the  necessity  for  mtiltiple  scanning  is  largely  avoided  by  the 
use  of  state  markers:  in  example  4,  to  save  multiple  scanning  we  require  new 
apparatus,  which  may  be  a  part  of  the  assembly  process  rather  than  the  syntax 
machine .  We  must  have  some  process  of  re-ordering  so  that  the  names  of  the 
variables  on  the  left  of  the  sign  may  be  combined  with  functions  that  can 

be  specified  only  after  the  right-hand  side  of  the  assignment  statement  has  been 
scanned.  Recall  that  the  symbols  copied  from  the  input  to  the  output  tapes  of 
the  syntax  machine  are  in  the  same  order  on  both  tapes.  For  the  statement  A=E 
we  can  most  simply  generate  an  output  A, (E),F  where  (E)  stands  in  place  of 
the  string  generated  by  the  right-hand  side,  and  will  eventually  in  the 
assembly  process  be  represented  by  a  single  level  of  the  assembly  push-down 
list.  The  symbol  F  stands  for  a  syntactic  operator,  or  set  of  syntactic  opera¬ 
tors  which,  because  their  generation  by  the  syntax  machine  follows  the  generation 
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of  (e),  can  be  made  to  depend  on  the  mode  of  (E). 

The  primitive  operator  that  we  seek  is  a  sort  of  interchange  operator 
’’(0:e:0)’’  of  actual  degree  2,  but  appearing  with  0  as  its  ostensive  degree. 
To  use  it,  and  to  preserve  the  well  formed  nature  of  the  postfix  notation  at 
all  stages  of  i  ts  processing  we  require  a  null  syntactic  variable  _Aq  .  The 
action  of  this  operator  is  defined  by  the  transformation 

(D)  ,  (E)  .  (0:e:0)  -  TLq  ,  (E)  ,  (D)  ...  2.4.1 

in  the  assembler’s  push-down  list.  The  null  symbol  7*1^  will  not  occur  as 
an  argument  of  all  syntactic  operators]  it  will  occur  as  an  argument  of 
(0:v:0)  but  not  of  (0:a;0). 


The  flow  diagram  for  the  assignment  statement  follows 


<AS> 

s;  = 

<V^>  =  <AS1> 

”(0:v:0)” 

<INT> 

’’(OjesO)” 

<3Ta> 

<INT>  ’’(STOsa 

SO)” 

<STQ> 

<DJT>  ”(STQ:a 

sO)” 

<AS1> 

= 

True 


9  9 


<CHS> 
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If  this  program  is  applied  to  the  assignment  statement  ”B=<J/D”  , 
the  resultant  output  is 

B,  G,  (CLA:a;l),  D,  (FDH:atl),  (0:v:2),  (0;v:l),  (0:e:0),  (STQ:a:l),  (0:v;3). 
When  the  ’’interchange”  operator  comes  to  be  processed  by  the  assenbler,  the 
assembler’s  push-down  list  contains  (or  refers  to) 


Position: 

m 

mv-l 

m+2 

Contents: 

B 

CLA  C 

(OtejO) 

FDH  D 

which  changes  by  the 

’ ’interchange 

$  f 

oper^' 

Positions 

m 

m+1 

m+2 

Contents: 

—A—q 

CLA.  G 

B 

FDH  D 


at  which  point  B  is  now  available  as  the  argument  for  the  operator  (STQsasl) 
which  converts  position  m+2  of  the  push-down  list  to  STQ  B.  The  last  operator 
then  completes  the  evaluation  of  the  program. 

2 , 5  Example  5 

Simple  Relational  Expressions 

Here  we  consider  relational  expressions  such  as  X  >  0  ,  X  >  Y  and  so  on, 
where  the  general  form  is  £2^  Op  E2,  where  E2  are  expressions  which  have 
values  which  are  numbers  and  Op  is  a  relational  infix  operator  specifying  a 
condition  that  holds  or  does  not  hold  between  the  values  of  ,  E2  .  The 

result  of  the  operation  is  a  binary  val'-'j,  which  we  shall  take  to  have  the 
following  interpretation. 

(a)  If  the  condition  of  the  relation  is  satisfied,  the  object  program  is 
to  branch, 

(b)  if  the  condition  is  not  satisfied,  the  branching  operation  is  to  be 


ineffective . 
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We  shall  consider  only  those  relations  where  Ej^  Op  Eg  is  equivalent 
to  Ej^  -  Eg  Op  0,  e.g. ,  where  Op  is  the  relational  operator  >,  >  etc. 

The  object  program  that  results  will  be  a  computation  of  followed  by 

a  branching  instruction.  The  program  branches  if  the  test  is  satisfied. 

We  shall  consider  first  the  case  ^  =  0  ,  and  then  treat  the  more  general 
case.  In  anticipation  of  the  next  example,  we  shall  provide  a  means  of 
complementing  the  relation  during  syntax  analysis,  so  that,  for  example,  X  =  Y 
could  be  translated  as  if  X  Y  had  been  the  text. 

For  the  special  cases,  the  initial  translation  from  the  original  names 

tri  t.hfi  nP +.A  oan  v*Anr\rnr-A  nA  A «??*.qrwQ  wa  rjQ 

- - - — - -  -  —  — AWS..  ^  -  -  W  J  y  W  * 

translate  them  by  single  characters  =’  ,  etc.  These  characters  will  now 
distinguish  the  special  cases. 

Then  the  syntax  program  for  the  recognition  of  simple  relational  expres¬ 
sions  is  an  extension  of  the  pi-ogram  for  recognizing  arithmetic  expressions, 
whi.ch  is  used  to  scan  the  arithmetic  expression  part  of  the  relational  expres- 
tion.  The  appearance  of  the  relational  operator  forces  an  exit  from  that 
recognizer,  whereupon  the  appropriate  branching  instruction  can  be  added  to  the 
output  according  to  the  type  of  relational  operator.  V/e  shall  give  an  example 
for  translation  to  the  lEM  709  for  the  operators  ='  and  , 

The  s^mtax  program  follows:  it  uses  a  new  assembly  operator  (D:ds0) 
which  constructs  a  branching  instruction  with  machine  instruction  code-  u,  and 
notes  in  the  assembler's  push-down  lists  that  the  constructed  instruction  lacks 
a  transfer  address  which  must  be  filled  at  some  later  time  in  the  assembly. 


I 

etc . 


TZE  ;;=  ”  (TZEjdsO) ‘ '  * 

TNZ  :;=  ’’(TNZsdrO)’ ' 

XCA  s;=  ”(XCA;a;0)'’ 

The  complementary  recognizer  <  R  >  is  similar  to  <  R  >  but  with  the 
comparators  =’  and  interchanged;  it  can  therefore  be  constructed 

with  much  in  common  with  <  R  >. 

For  the  general  case  Op  E2,  the  strategy'  for  constructing  a  recog¬ 
nizer  is  to  analyze  the  expression  as  in  example  3  until  the  relational 

operator  is  encountered.  At  this  point  a  chain  of  comparators  can  be  used 
to  test  for  each  relational  operator,  and  make  a  mark  in  the  syntax  machine’s 
push-down  list  using  an  M  operation;  the  state  of  the  recognizer  <  E  > 
(i.e„,  A+,  A-,  Q+,  Q-  )  may  then  be  tested  so  that  <  E  >  may  be  entered  again 

*  Two  IBM  709  machine  instructions  have  been  introduced,  namely 
TZE, transfer  control  if  the  AC  is  zero. 

TNZ, transfer  control  if  the  AC  is  not  zero. 
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(bit  not  at  its  normal  entry  point)  to  complete  the  recognition  and  corres¬ 
ponding  program  generation  for  the  expression  -E^+E^  *.  That  is,  the  syntax 
machine  is  programmed  to  read  Op  and  provide  an  output  as  if  it  had 
been  reading  the  arithmetic  expression  -Erj_+E2  .  This  is  achieved  by  enteiT.ng 
<  E  >  for  the  second  time  at  the  position  (in  the  flow  diagram  of  Example  3) 
A-  (or  A+  ,  Q+  ,  Q-  )  if  the  output  state  of  <  E  >  on  its  first  use  had  be€:n 
A+  (or  A-  ,  Q-  ,  Q+  respectively).  On  the  exit  from  <  E  >  for  the  second 
time  it  is  possible  to  add  the  appropriate  branching  instruction,  since  the 
specification  of  the  relational  operator  has  been  preserved  by  a  marking 
operation. 


*  For  this  process  to  be  effective,  the  expression  E2  “'ist  be  signedj 
this  necessary  sign  can  be  added  in  the  preliminai^  scan,  just  as 
the  characters  =0  were  replaced  by  ='  for  the  simpler  case. 

Thus  X=Y  should  be  traraformed  to  X=+Y  . 
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2.6  Example  6.  Combinations  of  Relations 

In  this  example  we  treat  combinations  of  relational  expressions  using  the 
Boolean  operators  ’’and,’’  ’’or”  and  ’’not.’’  In  so  doing,  we  introduce  a 
novel  algorithm  for  the  analysis  of  logical  expressions  by  use  of  the  syntax 
machine. 

In  examples  1  to  k  we  were  translating  programs  which  did  not  have  branch 
points  in  their  control  sequencing  so  that  the  object  program  was  obeyed 
sequentially.  In  example  5,  we  had  object  programs  with  a  branching  operation. 
Now  we  combine  programs  that  have  branching. 

We  define  a  program  block  as  a  block  of  object  program  which  is  an  assembled 
single  instruction  of  object  code  or  a  block  of  code  assembled  from  program 
blocks.  Program  blocks  may  be  conditional,  when  they  have  one  skip  exit  in 
addition  to  the  exit  of  normal  (sequential)  sequencing  -  or  they  may  be  uncondi¬ 
tional,  lacking  the  skip  exit.  Within  a  conditional  program  block  there  may 
be  many  branching  operations,  but  Lhe  block  as  a  whole  has  cne  skip  exit. 

Program  blocks  may  also  be  labeled,  but  by  one  label  only. 

For  example  6,  we  need  three  assmebJ.y  operators  for  combining  conditional 
program  blocks.  These  are  (Os'vsO),  (OswsO)  and  f0;x;0).  The  first  of  these, 
(OsvsO),  has  been  used  before  without  all  its  properties  being  announced,  it 
combines  those  program  blocks  which  are  its  arguments  into  one  program  block 
whose  skip  exit  is  the  common  skip  exit  of  the  argument  blocks.  If  all  the 
arguments  are  unconditional,  the  result  is  also.  At  most, one  of  the  arguments 
may  be  labeled,  which  label  (if  any)  is  the  label  of  the  combination. 

The  operator  (0:w;0)  has  two  operands,  which  are  program  blocks.  If  (A), 

(B)  stand  in  place  of  program  blocks,  the  block  (A),  (B),  (0sw;2)  is  the  ccmbina- 
tion  of  the  blocks  (A),  (B)  (in  that  sequence)  with  the  skip  exit  of  (A)  joined 
to  the  label  of  (B).  The  conditionality  of  the  result  depends  on  the  conditiona¬ 
lity  of  (B);  the  result  is  labeled  by  the  label  of  (A). 
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The  third  operator  is  a  labeling  operator  (0:x:0),  vrtiich  has  one  operand, 
vihich  must  be  an  unlabeled  prog.ram  block.  It  provides  a  label  for  the  block 
so  that  a  transfer  of  control  could  be  made  to  skip  over  the  block. 

The  diagrams  for  these  operators  are 


A,  B,  (0:v:0) 


skip  exit 
A,  B,  (0;w;2) 


A,  (0:x:l) 


They  pro^’ide  the  mechanism  for  realizing  conditional  expressions.  For  exaji^iLe, 
if  p(A)  is  the  proposition  that  the  skip  exit  is  the  actual  exit  from  A  , 
when  program  A  is  iWi,  then 

p(A,B,(0:v;2)  =  p(A)  v  p(B) 

p(A,B,(0:x:l),  (0sw:2))  =  A  p(B) 

The  operators  are  chosen  so  that  the  normal  exit  from  the  first  program  block 
is  the  normal  entry  to  the  second  program  block.  Thus  the  program  blocks  may 
be  assembled  in  position  before  the  connecting  operators  (0:v;0)  and  (0:w;0) 
have  been  reached.  Together  with  negation,  these  operators  enable  binary 
decision  programs  to  be  wri.tten  for  any  Boolean  function.  Moreover,  if  the 
logical  operators  =  and  £  are  not  used,  the  Boolean  function  can  be 
re-written  by  changing  the  operators  only,  without  duplication  or  change  of 
ordering  of  the  predicates  or  program  blocks. 


We  are  now  in  a  position  to  write  a  translation  algorithm  for  the 
source  language  string  defined  by 

<  CR  >  ; :  =  <  Cl  >  or  <  CR  >  |  <  Cl  > 

<  Cl  >  : :  =  <  C2  >  and  <  Cl  >  |  <  C2  > 

<  C2  >  :s=  <R>jnot  <R>  (<  CR  >) 

where  R  are  simple  relations  of  the  form  R  E2  as  treated  in  the 
previox;is  example. 

The  analysis  is  made  in  terms  of  the  operators  (0:v:0)  and  (0;w:0), 
or  rather  in  terns  of  the.  corresponding  logical  operators.  Because  the  input 
text  is  written  using  ’ ’and, ’ ’  but  the  analysis  is  made  in  terms  of  ’ ’w, ’ ’  we 
require  complementary  pairs  of  recognizers  so  that  tejrms  like  ’  ’Rj^  and  H^’  ’ 
may  be  translated  to  ’’net  R^^  w  R2’’.  In  this  example  we  have  to  apply  the 
complementary  recognizer  to  the  first  operand  so  that  ”Rt’’  is  translated  as 
if  ’’not  R^^’’  had  appeared  on  the  input  string  instead  of  ”R^’’.  The  use  of 
De  Morgan’s  rules  also  allows  the  ’’not’’  operarions  to  be  passed  inside 
parentheses  so  that  in  the  translation  they  apply  only  to  the  simple  relational 
expressions . 


The  syntax  program  <  CR  >  follows 


2.7  Ebcample  7.  Simple  Branching  Instructions 


We  deferred  from  example  5  the  matter  of  how  to  write  certain  branching 
instructions  which  have  no  counterparts  as  single  instructions  of  the  machine’s 
code.  For  example,  on  the  IBM  709  to  test  that  the  contents  of  the  accumulator 
is  greater  than  or  equal  to  zero,  we  must  first  test  for  zero  and  then  for 
positive  accumulator.  This  is  because  the  number  representation  is  by  sigTi 
and  absolute  value,  and  the  branching  instructions  operate  on  the  sign  (  TPL  = 
transfer  on  positive  or  TMI  =  transfer  on  minus)  or  on  the  absolute  value  of 
the  accumulator  (TZE  =  transfer  on  zero  or  TNZ  =  transfer  if  not  zero). 

Thus  to  provide  a  branch  on  the  acc-i-imulator  beine  positive  or  zero  we 
requii-e  a  TZF.  instruction  followed  by  a  TPL  instruction  both  with  the  transfer 
address.  The  assembly  operators  introduced  in  the  last  example  now  make  it 
possible  to  write  segments  of  the  output  string  that  correspond  to  tests  for 
the  inequalities  in  the  source  language,  as  follows 

Source  language  Output  string  translation 

>  (TZEsd.-O)  ,  (TPLidsO)  ,  (Osxsl)  ,  (0;w:2) 

<  (TZEsdsO)  ,  (TMI;d:0)  ,  (Osxsl)  ,  (0:w:2) 

>  (TZEsdsO)  ,  (TPLsdsO)  ,  (0svs2) 

<  (TZEsdsO)  ,  (misdsO)  ,  (0svs2) 

We  can  now  construct  subroutines  to  provide  these  output  strings.  For 
example  the  TCffi  subroutine,  to  test  >  ,  in  lixample  5  (second  part),  may  be 
written 


<TQB>  ss= 
<:TZE>  :s  = 
<  T1  >  ss  = 
<TPI>  5  s  = 


<:TZE>  <  T1  >  ”(0sv-s0)”  ,  where 

’’(TZEsdsO)” 

<XPI>  ’’(OsxsO)” 

’’(TPLsdsO)” 


The  subroutine  for  proTriding  a  greater  than  or  equal  test  is 


43 


<TGK>  ,  vfhe^e 

<TGE>  <TZii>  <aTI>  ’*(0:v;0)” 

and  <TZE>  and  <TPI>  are  the  subroutines  described  above. 


Example  8.  Iteration  Statements 


Tlie  purpose  of  this  cncample  is  to  introduce  another  syntactic  operator 
(or  assembly  operator)  of  order  2  which  will  be  useful  in  the  construction 
of  pregram  loops.  Consider  two  programs  A  ,  B  where  A  and  B  stand  for 
the  syntax  machine  output  for  these  programs.  Program  A  must  be  a  labeled 
program  and  program  B  must  be  conditional.  Tlien  the  operator  (0:y:0)  applied 
to  A  ,  B  forms  a  combination  of  A  and  B  in  that  order  with  the  skip  exit 
of  B  connected  to  the  labsled  entry  point  of  A  ,  as  shown  below. 

C  =  A,  B,  (0:yj0) 

Tne  result  pregram  C  may  itself  be  conditional,  if  A  was  conditional, 
or  labeled  if  B  was  labeled.  In  other  words  C  has  the  skip  exit  (if  any) 
of  A,  and  the  label  of  B  (if  any). 

As  an  example,  consider  am  iteration  statement  vdiicn  in  the  source  language 
consists  of  three  parts  concatenated  e.g.,  ABC,  where 

A  represents  an  initialization  of  variables  (i.e.,  iterates). 

B  represents  the  calculation  of  new  values  of  the  iterates 
from  the  old. 

C  represents  an  end  test  for  the  iteration, 


so  that  the  diagram  for  the  program  is  to  be 


A4 


Clearly  C  is  a  conditional  program  and  A  must  be  labeled:  the  postfix 
representation  is  either 

D=  A,  (0:x:l),  B.  (0;v:2),  C,  (0ty:2)  ...  2,8.1  or 

D2=  a,  (0:x:1),  B,  C,  (0;v:2),  (0ty:2)  ...  2.8.2 

according  as  3  is  first  combined  with  A  or  Tidth  C.  In  2.8,1 

B  could  be  a  conditional  program,  but  not  labeled:  in  2.8.2 

B  must  be  unconditional  but  may  be  labeled. 

We  refrain  from  giving  further  examples,  as  we  now  go  on  to  consider  the 
properties  of  the  translations  that  have  been  illustrated  in  the  preceding 
examples. 

Remarks  on  Part  2 

In  examples  1  to  8  we  have  shown  various  examples  of  translation  that  the 
syntax  machine  and  a  suitable  post-assembler  can  make.  We  now  gather  together 
some  of  the  salient  features. 

The  principal  property  of  the  process  is  that  the  ordering  of  the  variables 
is  not  changed  by  the  translation,  except  by  the  re~orderj.ng  of  arithmetic 
expressions  by  parenthesizing  and  by  the  interchanges  .made  by  the  operators 
’’b’*  and  ’“e”’.  Example  h  shows  how  the  role  of  the  interchange  operator  ’ “b’ ' 
can  be  tai<en  over  by  the  operator  ’’e’’,  so  we  may  consider  ’’e’'  only.  The 
properties  of  ”e’’  depend  on  the  assembler. 

The  simplest  assembler  would  be  one  which  assembled  directly  into  machine 
code  and  placed  each  instruction  into  its  final  position.  Thus  ”e”  could  be 
used  to  effect  the  transformation  2.4.1,  i.e., 

(D),  (E),  (0:es0>->  TL,.  (e),  (D) 

only  when  (D)  stands  for  the  address  part  of  an  incomplete  machine  instruction, 
where  (D)  is  stored  directly  in  the  assembler’s  push-down  list  and  not  merely  by 
reference  to  an  assembled  set  of  machine  instructions  already  located. 


We  hope  to  show  in  part  3  ol'  this  report,  how  this  condition  on  transla¬ 
tion  may  be  relaxed  by  using  the  mechanism  of  declarations. 

Another  property  of  the  object  program  is  that  no  advantage  has  been  taken 
of  ccmmon  subexpressions,  to  economise  in  the  object  code.  It  is  the  author’s 
opinion  that  the  search  for  common  subexpressions  in  algebraic  formulae  is  a 
simple  matter  for  the  congjosera  of  programs  and  should  be  left  to  them  rather 
than  to  the  mechanical  translators  if  it  is  desirable  to  have  a  quick  translation 
The  same  may  be  said  about  many  other  forms  of  economization  which  could  be  made 
xinnecessary  by  simple  rephrasing  of  the  source  program.  Example  3  shows,  however 
that  economization  in  the  use  of  arithmetical  r-egisters  is  possible. 

The  syntax  machine  can  differentiate  many  special  cases  of  the  source- 
language  text  where  the  properties  of  the  target  machine  allow  the  use  of  program 
tricks.  With  some  of  the  extensions  to  be  proposed  in  p)art  3  of  this  report, 
it  becomes  possible  to  recognize  many  special  cases  in  the  source  language  that 
are  of  common  occurrence,  and  to  provide  corresponding  segments  of  machine  code 
(or  macro-instructions). 

The  program  combination  operators  v,  w,  x,  y  provide  a  quite  powerful 
notation  for  combining  programs  with  branches;  in  effect  they  provide  a  method 
of  writing  a  wide  class  of  branched  programs  without  using  ejqslicitly  written 
labels.  For  example,  in  the  iteration  2.8.2  of  example  8  the  iteration  part  B 
could  be  entered  from  some  program  other  than  the  initialization  program  A. 
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3.il  Part  3-  Declarations 

Declarations  are  made  about  symbols  used  in  the  source  program  and  alter 
their  meaning.  They  are  used  to  specify  which  names  apply  to  the  various 
classes  of  objects  i.n  the  program,  e.g.,  which  are  names  of  floating-point 
variables,  fixed-point  variables,  functions,  procedures  etc.  They  may  also 
be  used  to  define  new  functions  in  terms  of  existing  fianctions,  or  to  define 
symbols  which  stand  in  place  of  vrtiole  segnents  of  text.  In  addition  the 
mechanism  of  declarations  may  be  used  internally  in  a  translator. 

We  distinguish  between  two  occasions  where  Declarations  affect  the  translator, 
when  a  Declaration  is  made  and  when  a  Declaration  is  used.  For  example,  if 
we  wish  to  use  the  name  ’’ABC”  as  the  name  of  a  function,  it  must  be  declared 
to  be  the  name  of  a  function.  This  declaration  may  be  explicit,  when  a  segnent 
of  the  source  text  says  explicitly  that  ABC  is  a  function,  or  the  declaration 
may  be  implicit,  when  ABC-  appears  in  such  a  manner  that  the  syntax  shows  that 
a  declaration  about  ABC  is  being  made  as  a  part  of  another  declaration,  as  for 
example  in 

ABC(X,Y)  =  X  sin  (Y), 

which  definition  might  be  given  without  any  explanation  in  the  source  language, 
because  this  form  of  expression  could  only  be  what  it  is,  a  definition  of  a 
new  function  whose  name  is  ABC. 

The  declaration  is  used  whenever  the  objects  named  in  the  declarations 
are  used  elsewhere  in  the  text,  as  for  example  if  we  e  the  function  ABC  as 
part  of  an  arithmetic  expression,  e.g., 

Z=  I  +  ABC  (X+y,w) 


We  shall  discuss  three  types  of  declaration 

(1)  Declarations  about  the  syntactic  properties  of  names. 

(2)  Declarations  vdiich  define  substitutions,  ^here  a  declaration  is 
made  that  a  symbol  stands  in  place  of  a  string  of  symbols. 

(3)  Declarations  about  substitutions  in  which,  when  substitution 
is  made  of  a  string  for  a  symbol,  the  string  is  modified  by 
parameters . 

3»2  Declarations  about  Syntactic  Piropertiea 

An  example  of  a  declaration  about  syntactic  properties  would  be 
Integers,  A,  B,  Cl 

which  delares  the  names  A,  B,  Cl  to  be  the  names  of  integer  variables.  We 
regard  the  properties  of  names  as  syntactic  properties,  because  in  the  analysis 
of  statements  we  must  distinguish  between  the  various  types  of  variable ,  and 
between  the  names  of  variables  ard  the  names  of  functions.  Our  intention  is 
to  replace  the  names  like  A,  B,  and  Cl  by  symtxcls  like  I2  and  which  are 
so  constructed  rhat  the  syntax  machine  can  recognize  them  as  the  names  of 
integer  variables.  The  subscripts  could  have  uses  in  storage  allocation. 

However,  we  must  first  recognize  declarations  before  we  can  act  on  them. 

To  recognize  such  declarations  and  distinguisn  them  from  other  forms  of  state¬ 
ment  ws  assume  that  the  Syntax  Machine  is  analyzing  programs  statement  by 
statement.  Let  us  suppose  that  there  are  several  sorts  of  property  for  which 
we  wish  to  make  declarations  about  names.  We  can  start  the  scan  of  statements 
by  checking  whether  any  of  the  leading  words  are  signals  for  declarations.  A 
chain  of  comparat-ors  will  do  this.  For  example,  if  we  have  the  declarations 
about  integer  variables,  functions  etc.,  we  could  use  the  following  syntax 
program  <  SD  >. 


where 


<  SD  > 

::=  <  DI  > 

<  DI  > 

: :=  i  n  t  e 

<  ZI  > 

:  t=  <  zn  > 

<  zn  > 

: : =  <  ZI2  > 

<  ZI2  > 

L 

<  DF  > 
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<  ND  > 


zn  >j 


ZI  >  ”(0:iii;0)” 


and  <  DF  >  is  similar  to  <  DI  >  but  begins  with  a  chain  of  comparators  for  the 
word,  ’’functions’ ’ (or  its  singular),  and  the  subroutine  corresponding  to  <  ZI2  > 
is  named  by  the  operator  ”(F:k:0)*’.  <  ND  >  is  the  syntax  program  for  state¬ 
ments  iirtiich  are  not  declarations.  L  is  a  comparator  for  letters  of  tte  alphabet 
and  NL  is  the  comparator  for  letters  and  manerals.  U  is  a  comparator  for  all 
characters  but  the  statement  ending  punctuation. 


The  syntactic  operators  are 

’’(0;m:r)’*  Return  control  to  the  syntax  machine  from  the  assembler, 
resetting  the  assanbler  push-down  list  so  that  the  next 
symbol  placed  there  will  be  in  the  same  position  as  the 
first  symbol  used  in  this  use  of  the  assembler. 

”(D:k:r)”  This  is  a  combined  table  lookup  and  table  constructing 
operator.  It  constructs  and  uses  a  table  of  equiva¬ 
lences  bet'-feen  external  and  internal  names.  A  possible 
definition  of  this  operator  might  be: 

(a)  If  D=0  and  the  external  name  is  not  already  stored, 
store  it  in  the  proper  place  and  generate  a  corres¬ 
ponding  internal  identifier,  placing  it  in  the 
corresponding  position  of  the  table  and  in  the 
result  position  of  the  push-down  list. 

(b)  If  D^,  find  the  place  in  the  table  for  the 
external  name  and  in  the  corresponding  position 
for  the  intexTial  name  place  a  generated  symbol  Dj^, 
where  any  name  so  generated  may  be  recognized  by 

a  comparator  as  an  internal  identifier  of  class  D, 

Place  D^  in  the  push-down  list. 

(c)  Otherwise,  look  up  for  the  external  name  and 
place  the  corresponding  internal  name  in  the 
result  position  of  the  push-down  list. 

In  the  use  of  this  operat.or  in  the  making  of  declarations, 
only  operation  (b)  would  be  used.  Part  (^a)  of  the  operator 
makes  it  useful  for  dealing  vd-th  the  class  of  names  about 
which  no  declarations  are  made.  A  possible  method  of  storing 
external  names  is  discussed  by  Vfilliams  (Comm.  ACll  2,  6.  p21 

June  1959). 


In  the  program  <  DI  >  and  corresponding  programs,  the  return  from  the 
program  must  be  made  in  a  special  way.  When  the  operator  ’’(0:m:r)”  has  been 
written  on  the  output  string  from  the  syntax  macliine,  the  assembler  is  then 
entered  to  evaluate  the  part  of  the  output  string  generated  by  the  subroutine 
<  DI  >j  after  the  evaluation  process,  control  retuins  to  the  syntax  machine 
which  is  set  so  that  further  output  overwrites  the  stri-ng  which  the  assembler 
processed. 

If  the  program  <  DI  >  is  applied  to  the  example  at  the  beginning  of 
this  section  the  syntajc  machine  produces  an  output 

,  A,  (Isksl),  B,  (likil)  C,l,(l:ki2),  (0:iii;3) 
whose  evaluation  by  the  assembler  will  store  the  exterml  names  and  generate  the 
corresponding  internal  names . 

In  statements  which  are  not  declarations,  external  names  must  be  replaced 
by  their  internal  name  equivalents.  This  may  be  done  by  the  program  which  we 
shall  discuss  in  the  next  part  where  we  show  how  statements  may  be  handled  by 
a  similar  mechanism  to  substitution  declarations. 

3.3-  Substitution  Declarations 

We  now  consider  the  type  of  declaration  where  a  string  in  the  source  language 
is  given  a  name,  which  may  thereafter  stand  in  place  of  the  string.  There  are 
two  sorts  of  replacement  which  we  might  consider;  replacement  in  the  input  string, 
and  replacement  in  the  output  of  the  syntax  machine.  The  latter  is  what  we 
shall  consider  as  the  medhanism  is  useful  in  dealing  with  non-declaratory  statements. 

We  take,  as  an  example  of  this  sort  of  declaration, 

Let  HI  ;=  A=B+C 

by  which  we  define  B1  to  stand  in  place  of  the  statement  ’ ’A=B-*'C  ’ ' ,  As  before 
we  can  write  a  program  with  a  chain  of  comparators  that  check  the  presence  of 
the  word  ’'Let’’  before  proceeding  in  the  manner  jjarticular  to  this  type  of 


statement.  The  program  is  called 


<  DL  >  ::=  <  IdF  >  :  =  <  DL  > j  ”(C':i:0)” 

<IdFl>;;=  L  ”(F:k::0)”  ,  <  IdF  >  ::=  <  IdFl  >  ”(0:j:0)” 

<  DLL  >  ;:=  <  Id  >  j  U 

T  J  -sw.  s  ^  ▼  r  %Tr  *  *  f  f\ . »  t  \  *  * 

^  j-vi,  .i—  xj  ^£1“  f 

where  U  is  the  comparator  for  all  chaj-aoters  except  end  of  statement 
punctuation . 

When  this  program  is  applied  to  the  example  the  resultant  string  is 
....  B,  l,(F:k;2),  (0:j;l),  A,(0:k:l),  =,B,(Orkrl).  ^-.C,  (Osksl),  {O-itS) 

and  because  of  the  special  treatment  of  subroutine  returns  associated  with  the 
operator  \\ji2.%\j') ^  ohns  strxng  is  now  evaluated  by  the  assembler.  What  is  uo 


happen  is  this 

(a)  The  external  name  B1  is  processed  by  the  operator  (F;ks2),  with  the 
result  that  the  internal  identifier  F(B1)  is  placed  in  the  push¬ 
down  list. 

(b)  The  operator  (0:j;l)  is  next  encountered.  Its  operand  is  the 

internal  name  generated  in  (a).  Its  purpose  is  to  set  up  a  table 
of  absolute  addresses  where  the  processed  string  form  of  the 
declaration  will  be  stored.  This  address  will  be  that  occupied 
in  the  example  by  the  character  B.  The  table  of  locations  of 
processed  strings  then  contains  F(B1)  and  L(B)  where  L(B)  is  the 
location  of  the  first  character  of  the  string  in  process.  The 
result  in  the  Push-down  list  is  a  null  symbol  . 

(c)  The  k  operators  replace  A,  B  and  C  by  the  corresponding 
internal  names. 

(d)  The  operator  (0;i;r)  then  sets  the  syntax  machine  to  work  on  the 
result  in  the  assenbler’s  push-down  list  which  is  now 
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where  0^  is  the  inteirnal  naiEe  of  A,  and,  of  course,  is  now  a  sin5]_e 

character  by  which  the  declared  syntactic  properties  of  A  may  be 

recognized.  The  syntax  program  starts  at  0^.  The  operator  (0:i;6) 

left  in  the  push-down  3.ist  now  acts  as  an  end  of  statement  mark. 

When  the  syntax  machine  is  appiied  now  it  analyzes  the  string  by  the  type 

of  program  of  which  examples  were  gi’^en  in  the  second  part  of  th-is  report. 

the 

On  completion  of  its  work, /program  exits  via  a  special  true  return  to  the 
syntax  program  that  called  <  DL  >.  This  abnormal  return  switches  the  input 
of  the  syntax  machine  back  to  the  original  string.  This  abnormal  return  situa¬ 
tion  can  be  anticipated  when  control  left  the  syntax  machine  for  the  assffinbler. 
and  the  position  of  the  input  string  stored. 

The  processing  of  non-declaratory  stataments  can  be  done  in  the  same  way 
except  for  the  treatment  of  the  riSiuc  of  the  string.  The  syntax  program  is  <  ND  >, 
<  ND  >  ::=  <  NDl  >  DLL  >J  ”(0si?0)’‘ 

<  NDl  >  ’’(0;nsG)** 

and  <  DLI  >  is  as  before, 

if  <  ND  >  is  applied  to  the  string  "A^B+C"’,  the  first  output  string  is 
(OtnsO),  A,(0:k:l),  =,B,(Ojk;l),  +,C,(0:k:l),  (0:it6) 

The  processing  proceeds  as  before  except  for  the  action  of  the  operator  (OsnsO), 
which  is  to  generate  an  internal  formula  symbol  G_  as  its  result.  Otherwise, 
it  acts  like  the  operator  (0;j:l)  in  placing  the  internal  formula  symbol  in  the 
table  of  processed  string  locations.  The  result  is 
...  ,  Gj.,  0^,  =,  0g^  +,  0Q,  (0si:6) 

When  this  comes  to  be  processed  by  the  syntax  machine,  processing  starts  at  the 
second  symbol  as  before,  since  the  result  of  the  operator  (0;i:6)  placed  in  the 
push-down  list  is  merely  Gj..  As  a  consequence,  the  push-down  list  of  tftie  assem¬ 
bler  contains  a  list  of  symbols,  one  for  each  statement  processed.  For  declara- 
tioriS  this  symbol  is  the  null  symbol;  for  other  statements  it  is  the  internal 
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formula  symbol.  V/hen  all  statements  have  been  read  the  string  in  the  assembler’s 
push-down  list  stands  in  place  of  the  program,  which  now  exists  in  corresponding 
order  as  segnents  on  the  output  string  of  the  syntax  machine.  These  segnents  all 
end  with  a  punctuating  symbol  that  was  added  by  the  second  pass  of  the  syntax 
machine . 

The  assembler  also  lias  a  second  pass,  which  is  an  assembly  to  machine- 
language  code.  It  is  here  that  the  substitution  of  strings  for  internal 
formulae  symbols  and  declared  string  symbols  occurs.  Unless  a  ’’Load  and  Go’’ 
type  of  assembly  is  required  this  second 

assembly  would  be  done  when  the  compilea  program  is  loaded.  Actually  the  loading 

process  would  also  include  a  syntax  analysis  since  it  is  very  easy  to  incorporate 

done  by 

coiTections  at  load  time  by  replacing  whole  statements.  This  would  be/writing  a 
declaration  for  the  corrected  statement,  using  the  internal  formiila  symbol  for 
the  string  to  be  corrected. 

The  expansion  of  internal  formulae  symbols  is  done  by  the  assembler  switch¬ 
ing  its  input.  This  may  be  explained  as  follows. 

Suppose  that  the  assembler  is  reading  from  a  string  SI  and  finds  internal 
formula  symbol.  The  table  of  string  locations  is  consulted  to  find  the  absolute 
location  of  the  first  symbol  of  the  string.  The  assembler  takes  this  next, 
noting  that  it  has  to  return  to  the  original  string  when  the  end  of  the 
secondary  string  is  reached.  Clearly  this  process  is  recursive,  if  all  the 
retum  addresses  are  kept. 

3.4  Declarations  about  Macro-instructions 

An  important  claiss  of  declarations  is  that  in  which  macro-instructions 
are  defined  by  a  declaration  such  as 
Macro  F(X,I,Z)  =  l(Y+Z) 

where  the  form  on  the  left,  riamely  F(X,I,Z),  is  short  for  the  expression  on  the 
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right.  The  macro -ins true  Lion  is  different  from  the  closed  subroutine  in  that 
every  time  the  short  form  is  used  in  the  program  a  modified  copy  is  placed  in  the 
appropriate  part  of  the  target  program.  In  the  definition,  the  parameters  (  i.e., 
X,i,Z  in  the  example)  are  dunmy  symbols. 

There  are  two  ways  in  which  we  might  approach  this  problem,  by  using  substitu¬ 
tion  methods  on  the  input  string  of  the  syntax  machine  or  by  using  substitution  on 
the  output  as  in  the  previous  section.  In  the  first  method  we  would  consider  macro¬ 
instructions  to  he  merely  shortened  veys  of  wriiting  parts  of  ■  the  source  lB.nguage 
with  the  expansion  to  full  form  being  made  in  the  input  string,  so  that,  for 
example,  vrriting  F(A, B,C)  is  completely  eiquivri.ent  to  writing  in  its  place  the 
expression  A(B+C).  This  method  has  the  advanta^^  that  we  do  not  need  to  make  ;iny 
declarations  about  the  modes  of  the  variables  (i.e.,  turtle ther  the  variables  are 
integer  variables,  floating-point  variables  etc.).  The  second  method  is  more 
appropriate  for  large  sections  of  a  program,  such  as  the  ALC50L  procedures.  Here 
we  deal  with  Method  1. 

The  macro  declaration  is  processed  as  follows. 

On  the  first  pass  of  the  syntax  machine  the  word  ’’Macro’’  can  be  recognized 
and  program  control  switched  to  the  program  far  processing  the  rest  of  the  decla¬ 
ration.  The  program  scans  the  text  and  produces  a  string  whose  evaluation  by  the 
assembler  will  leave  the  following  pattera  on  the  output  string.  For  the  example 
’’Macro  F(X,T,Z.)  ;=  .X(T+Z)”,  the  pattern  is 

Cell  address  n  n+1  n+2  n+3  n+4  n+5  n+b  n+7  n+S  n+9  n+lO 
Contents  0  0  0  ^  n  (  n+1  +  n+2  )  ^ 

The  overlined  symbols  have  a  special  effect  on  the  syntax  machine.  To 
distinguish  them  from  normal  symbols,  they  might  be  negative.  The  first  three 
cells  are  to  hold  the  names  which  will  be  the  parameters  of  the  macro  when  it  is 
used  :  the  symbols  ^  and  ^  cause  switching  of  the  input  and  output  of  the 
syntax  machine;  the  symbols  like  n  are  address  symbols,  in  the  sense  that  when 
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n  is  read  by  the  syntax  machine,  it  acts  as  if  it  were  reading  the  symbol  from 
the  ceil  whose  address  is  n  . 


The  program  for  making  the  declaration  is  <  MD  >. 


<  MD  > 

m  a  c  r 

o  <  MDl  >  (<  MD4  >)  <  MDIO 

<  MDl  > 

<  MD2  > 

”(0:p;0)” 

<  MD2  > 

:  :  = 

<  MD3  > 

<  MD3  > 

:  t  = 

L  f  111' 

J  ”(M;k:0)’* 

<  MD4  > 

s  ;  = 

<  MD9  > 

MD5  >  j 

<  MD5  > 

:  2  = 

,  <  MD9  > 

<  MD6  > 

L  NL' 

^  ^’(0:k:0)" 

<  MD7  > 

:  a  = 

<  MD8  > 

|^<  MD8  ”(f(:3:0)” 

<  MD8  > 

:  t  = 

<  MD6  > 

’'(0:r:0)'‘ 

<  MD9  > 

1 1  = 

<  MD6  > 

”(0;q;0)” 

<  MDIO  > 

:  t  = 

t  1 

’({i:s;0)” 

<  MD  7  > 


" ’ (0;m:0) 


9  f 


The  application  of  MD  to  tie  example  will  yield  an  output  string:  - 
F,  (Msk:l),  (0;j;l),  (0:p:l),  X,  (0:k:l),  (0:q:l),  Y,  (0:k:l),  (0:q:l) 

.  Z,  (0:ksl),  (0;q:l),  (^i:s:0),  X,  (0;k;l),  (0:r;l),  (,Y,(Osk;l),  (Osr:l),  + 
,  Z,  (0;k;l),  (0:r;l),  ),  (0;s;6),  (0:m;3)  * 

The  new  assembly  operators  are;  - 
(Osp;l)  switch  the  output  from  the  assembler  to  the  output  list. 

This  ensures  that  the  coded  definition  of  the  macro  is  placed  on  the 
output  string.  This  operator  also  clears  out  a  temporary  table  used 
by  the  q  and  r  operators. 

(0:q:l)  In  the  temporary  table  mentioned  above  place  the  internal  name  (which 
is  the  operand)  and  the  absolute  location  in  which  this  was  stored 
at  the  time  (0:q:l)  was  applied  to  it. 


*  as  with  other  examples  it  has  been  assumed  that  the  input  text 
contains  no  spaces.  This  3in^)lifie3  the  exposition. 
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(Osrsl)  The  operand  is  an  internal  name.  Look  for  it  in  the  temjxjrary  table 
and  if  it  is  found  there,  replace  the  operand  by  the  absolute  address 
noted  against  it  in  the  temporary  table;  othervrise  the  operator  has  no 
effect.  The  purpose  of  this  operator  is  to  replace  the  parameters  by 
an  address  referring  to  the  pi^ition  in  which  the  actual  parameters 
will  be  placed  when  the  macro  is  used. 

(OsssO)  No  matter  what  the  operand  count  of  this  operator  ,  write  the 

character  from  the  data  field  in  the  place  occupied  by  the  operator. 

The  evaluation  of  the  output  of  the  first  scan  of  the  syntax  machine  causes 

(1)  The  name  of  the  macro  to  be  written  in  the  table  of  processed  strings 
together  with  the  address  (  n  in  the  example)  of  the  processed  macro 
definition. 

(2)  The  p  operator  then  switches  the  output  from  the  assembler  to  the 
output  string, 

(3)  The  q  operators  then  take  note  of  the  formal  parameters  in  the 
definition,  so  that  the  r  operators  can  replace  them  in  the  processed 
string  by  the  absolute  address  of  the  location  to  which  the  internal 
names  of  the  parameters  will  go  when  the  string  is  used. 

(1)  The  s  operator  writes  a  mark  0  which  will  switch  the  input  of 
the  second  scan  of  the  syntax  machine  when  the  macro  is  used. 

To  use  such  a  macro  we  have  to  make  some  extensions  to  the  sjmtax  Bachine, 
so  that  the  input  can  be  switched  from  one  text  to  a  subsidiary  text  and  then 
returned  to  the  original  text.  The  symbols  that  are  special  in  this  respect  are 
symbols  of  class  M  denoting  internal  macro  names,  the  special  symbols  ^ 

written  by  the  operator  s  ,  and  the  absolute  addresses  written  by  r  operators 
in  the  processed  form  of  the  macro  definition. 
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Now  consider  what  happens  when  the  syntax  machine  scans  a  test  in  which  the 
internal  symbol  Mp  appears.  This  would  not  be  in  the  oidginal  text  so  we  are 
talking  about  the  second  pass  of  the  syntax  machine  when  its  input  has  external 
names  repilaced  by  interncil  names.  Let  the  original  source  text  contain 
”F  (A,B,C)”  where  F  is  the  macro  of  our  ex.aiPiple  and  A,B,C  are  names  cf  varia¬ 
bles  or  constants.  Then  she  corresponding  string  within  the  input  for  the 

second  p>ass  of  the  syntax  machine  is  0^  0C»)>  where  the  commas  are 

used  to  separate  the  characters  of  this  string,  and  the  characters  0.  etc.  are 
the  internal  character  names  of  A  etc.  * 

A  special  comparator  is  used  for  symbols  of  class  M  i.e.,  names  of  this 
type  of  Macro.  If  such  a  symbol  is  recognized  by  a  comparator,  the  output  of 
the  syntax  machine  is  switched  to  the  address  where  the  macro  definition  begins . 
The  syntax  program  then  fills  the  parameter  cells  with  the  names  of  the  parame¬ 
ters  used  here,  namely  0^,  0g  and  0^  .  When  these  have  been  read  the  syntax 
machine  uses  yet  another  special  comparator  to  check  the  presence  on  the  current 
output  position  of  the  symbol  0  and  if  it  is  found  the  input  of  the  syntax 
machine  is  switched  to  the  next  position  of  the  macro-definition  list  (cell  n+4 
in  the  exaagjle),  and  the  output  list  of  the  syntax  machine  reset  to  its  state 
before  the  M  symbol  appeared. 

The  syntax  machine  now  scans  the  rest  of  the  macro  definition  lontil  the 
symbol  0  appears  when  the  input  of  the  syntax  machine  is  switched  back  to 
what  it  was  before  the  last  H  symbol  appeared.  By  the  usual  technique  of 
push-down  lists  it  is  simple  to  make  these  macros  recursive. 

The  syntax  program  for  the  use  of  macros  is 


text.  This  should  be  placed  in  all  parts  of  a  syntax  program  vrtiere  an  M 
might  be  under  the  scrutiny  of  the  syntax  mach_ine. 
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part  4.  The  Assembler 

In  parts  1,  2  and  3  much  has  already  been  said  about  the  assembler. 

We  consider  now  only  one  part  of  the  assembler,  that  used  to  assemble  postfix 
strings  to  target  machine  language,  using  the  operators  ’’a’",  ’’c”  and  ’’d’’ 
which  form  single  machine  instructions  and  the  operators  ’’e’’,  ’’v’’,  ”w’’, 

' ’x”  and  ”y’’.-  which  manipulate  program  blocks. 

The  assembler  for  these  operators  is  best  considered  separately  from  the 
assaabler  for  other  operators  since  the  push-down  list  requires  four  registers 
AH^,  BHj.,  GEj.  and  LE^.  on  each  level  r  .  ABj.  holds  the  names  and  operators 
from  the  postfix  string  being  assaabled.  BE^.  holds  the  asaembled  forms  of  sin^e 
instructions  provided  by  the  operators  ’’a*’,  "c’’  and  ”d”.  CB^  holds  an 
address  which  refers  to  a  conditional  machine  instruction  or  a  conditional  pro¬ 
gram  block.  It  also  holds  a  negative  sign  *  if  the  level  r  is  holding  a 
single  HBchine  instruction  in  BR^.  LR^  holds  an  absolute  address  which  is 
a  traniifer  point  generated  by  a  label.  There  is  also  a  location  count<5r  whose 
Contents  L  give  the  address  where  the  assembled  instructions  of  the  program 
go  when  transferred  from  the  push-down  list. 

For  this  assembly  it  is  assumed  that  the  ’ ’k’  ’  operator  which  provided 
internal  names  generated  the  subscripts  on  these  names  by  incrementing  a  counter 
so  that  the  subscript  is  a  relative  address  for  each  variable  in  the  block  for 
variables  of  that  type.  The  final  values  of  these  counters  (one  for  each  class 
of  variable)  can  be  used  to  provide  base  addresses  for  each  block,  from  which 
the  absolute  addresses  of  any  variable  can  be  constructed  by  the  operator  ’'a**. 


*  We  assxime  that  each  register  of  the  push-down  list  has  a  sign 
position  and  a  Vcd.ue  position,  so  that  representation  is  by 
sign  (  +  or  -  )  and  value 
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4.1  The  Operator.  (D:a;l) 

When  this  operator  appears  in  the  position  ARj^  the  internal  name  which 
is  its  operand  is  in  AR^  ^  .  The  absolute  location  corresponding  to  the  internal 
name  is  combined  vd.th  the  machine  instruction  specified  by  D  and  the  result 
placed  in  made  negative  to  show  that  contains  a  single 

machine  instruction.  The  push-down  list  level  counter  is  then  set  to  n,  so 
that  the  next  item  is  brought  into  If  the  operand  was  the  name  of  a 

working  location  send  it  back  to  the  list  of  working  spaces  (see  below). 

4.2  The  Operator,  (Dte;0) 

This  operator  combines  a  function  specified  by  D  with  a  working-space 
location.  Associated  with  this  operator  and  with  operator  ”a'"is  a  list  of 
used  working  spaces.  If  this  list  is  empty  then  ”c”  must  construct  the 
name  of  a  working  location  which  it  can  do  by  incrementing  a  couriter  whose 
initial  contents  was  the  address  of  the  beginning  of  a  block  of  storage  allo¬ 
cated  for  working  space.  If  WS  is  internal  name  of  this  working-space 
var:;.able,  (selected  from  the  list,  or  constructed)  then  the  result  in  the  push¬ 
down  list  is 

=  WS 

BR^  =  D;L(WS)  i,e,,  the  machine  instruction  with  function  D  and 

address  L(WS)  which  is  the  absolute  location 
corresponding  to  the  vrorking  space  name  WS. 

CHp  is  negative. 

where  the  operator  (D:c:0)  was  in  .  Note  that  the  operator  ’’c'"  acts 

like  the  operator  ”k’ ’  in  the  production  of  an  internal  name.  We  want  an 
internal  name  to  appear  in  Aflj^  because  it  will  subsequently  be  used  as  the 
operand  of  an  ’ ’a* ’  operator.  The  next  item  to  be  read  into  the  push-down 


list  must  enter  level  n+1. 
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4.3  The  Operator.  (D:d;0) 


If  this  appears  on  level  n  in  AB^  then  we  have  in  the  result 
BR^  =  D;0  the  machine  instruction  vdth  zei-o  addi.'ess. 

GBjj  =  -  n  to  show  that  there  is  a  machine  instruction  in  BRj^. 

The  nKct  item  to  be  placed  in  the  push-down  list  must  be  placed  on  level  n+1  . 

4.4  The  Labeling  Operator  .  (Otxrl). 


This  has  two  casws  according  as  the  operand  is  a  machine  instruction 
within  the  push-down  list  or  is  a  block  of  code  assembled  in  its  final  position 
Case  1  t  The  initial  configuration  of  the  push-down  list  is 


BBn 

Cfin 

LRix 


holds  a  m chine  instruction 
is  negative 
should  be  positive 


(0:xs1) 

ihis  case  is  recognized  by  CRj^  negative.  LR^  should  also  be  positive, 
indicating  an  unl.abeled  instruction.  The  action  of  the  labeling  opertor  in  thi 
case  is  to  mark  level  n  on  the  push-down  list  by  making  negative. 

Case  2,  In  this  case  CR^  is  positive,  is  positive  and  contains  the 

address  which  will  be  the  value  of  the  label  if  one  is  required.  This  is 
furnished  by  the  the  operators  ’ ’v” ,  ”w’ ’  or  ”y’’-  The  action  is  merely  to 
make  LH^  negative . 

In  both  cases  the  next  item  is  read  into  position  ^.1  • 
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4.5  The  Operator.  (Otvtr) 

Suppose  that  this  operator  appears  in  ?  then  its  operands  are  in 

levels  n  'Uirough  n+r-1  of  the  push-down  list;  they  may  be  machine  instruc- 

list 

tdons  still  in  the  push-down/ (recognized  by  the  GE  part  of  the  level  being 
negative)  or  they  may  be  blocks  of  machine  code  already  stored. 

The  first  action  of  the  operator  is  to  check  that  there  is  at  most  one 
labeled  operand,  by  testing  all  the  LR  positions  of  the  operands;  those  levels 
that  are  labeled  will  have  negative  LR  . 

Then  the  operands  are  taken  in  order  and  process  A  applied  to  those  that 
are  single  instructions  still  within  the  push-do»m  list.  Process  A.  is  conmon 
to  operators  ’  V” ,  °’w*’  and  ’’y"  I  it  places  the  single  instructions  on  their 
final  positions  in  store,  using  L  which  is  incremented  by  1  vrtienever  single 
instnctions  go  to  the  store.  If  a  conditional,  instruction  is  storfid  from  the 
push-down  list  in  location  L  then  L  is  copied  into  the  CR  position  and  L+i 
is  copied  into  the  LR  position.  La  both  instances  the  signs  describing  condi¬ 
tionality  and  labeling  are  preserved.  At  this  point  all  the  operands  have  been 
stored  in  their  fiml  positions. 

Now  we  must  connect,  any  skip  exits  from  the  operands,  A  single  machine  skip 
instruction  will  reside  in  its  final  position  with  its  trar^sfcr  address  zero, 
and  the  corresponding  CR  position  will  point  to  the  location  of  the  instruc¬ 
tion.  For  program  blocks  the  CR  position  will  point  to  a  location  holding  one 
of  the  conditional  instructions  in  the  block.  If  the  transfer  address  here  is 
zero,  then  this  is  the  only  conditional  instruction  in  the  block  that  contributes 
to  the  skip  exit.  If  the  address  of  the  conditional  instruction  is  non-zero  it 
is  pointing  to  another  conditional  instruction  contributing  to  the  skip  exit. 
Thus,  the  CR  contents  is  the  first  of  a  chain  of  addresses  ending  with 
address  0  ,  which  specify  locations  of  instructions  contributing  to  the  skip 
exit,  (except  the  last,  O). 
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In  the  ’’v"  operator,  these  chains  are  linked  together  into  a  single  chain, 
which  now  shows  which  are  the  cdnditioncil  instructions  requiring  transfer 
addresses.  The  first  member  of  this  chain  is  stored  in  CRj^  .  In  LE^  is 
stored  the  absolute  v^alue  of  the  label  if  any  of  the  operands  were  labeled. 

As  usual  LR^  shows  labeling. 

4.6  The  Operator.  (0:w;2) 

If  any  of  the  operands  are  single  instructions  then  pTOcess  A  is 
applied  to  them,  reducing  the  operands  to  refer  to  program,  blocks  in  their 
rins-X  nositXon*  Ths  fXi'S't  second  cponEnds  Er©  t-h©n  ch©Gk©d  for  condX— 
tiomlity  and  labeling  respectively.  Then  piucess  B  is  applied  to  link  the 
skip  exits  from  the  first  operand  with  the  label  of  the  second  operand,  by 
proceeding  down  the  chain  of  locations  in  which  are  to  be  inserted  the  address 
value  of  the  label.  Finally,  the  conditional  information  for  the  second 

operand  replaces  CR^j  to  form  the  result  on  level  n.  The  next  item  is  to  be 
read  into  level  n+1. 

4.7  The  Operator.  (0:y;2) 

Tae  action  of  this  operator  is  almost  identical  \in.th  that  of  (0:w;2)  but 
the  label  from  the  first  operand  is  used  with  the  chain  of  locations  of  condi¬ 
tional  instructions  of  the  second  operand. 

In  the  operators  ”v’’,  ’*w”  and  ”y’'  it  may  be  necessary  to  provide  a 
label  value  in  anticipation  of  the  use  of  ’ ’x’  ’  to  label  the  result.  If  the 
result  of  the  operation  is  an  unlabeled  block  and  process  A  has  been  used  to 
insert  single  instructions  in  their  final  locations  then  the  LH  position  of 
the  result  should  hold  +,L+1  ,  where  L  was  the  address  of  the  location  last 
used  by  process  A. 
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4.8  Operator.  (0;e:0) 

If  this  operator  appears  in  ,  then  the  data  on  level  n-2  of  the 

push-dovni  list  is  placed  on  level  n  ,  and  the  registers  on  level  n-2  set  to 
zero  to  indicate  nullity.  The  next  item  to  be  read  to  the  push-dovm  list 
goes  to 
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Conclusion 

This  report  has  outlined  a  method  by  vrfiich  a  ccmpiler  can  be  prograjnmed 
(by  syntax  machine  programs)  to  accept  various  source  languages.  Apart  from 
the  final  assembly  of  the  postfix  string  to  target-machine  code  the  method  is 
not  particularly  dependent  on  the  ccmiputer  making  the  translation,  since  the 
canpiler  is  constructed  to  perform  interpretively  on  the  syntax  program  and 
on  the  syntactic  operators  in  the  postfix  strings. 

The  ^ntax  pTOgram  will  not  be  lengthy,  as  is  demonstrated  by  the  examples 
of  Part  2.  Perhaps  300  -  400  instructions  in  the  S3rntax  program  are  sufficient. 

The  quality  of  the  translation  will  be  variable,  since  no  method  of 
economisation  of  subexpressions  is  included,  nor  is  any  method  of  economi'.Lation 
of  index  register  proposed.  Methods  for  these  could  be  developed,  for  example, 
by  modifying  syntax  machine  so  that  it  coiuld 

(1)  Analyze  arithmetic  expressions  to  px'oduce  the  so  called  three-address 
form  (this  might  require  a  right  to  left  scan)  and  search  for  common 
subexpressions  among  the  output. 

(2)  Abstract  from  the  source  language  some  parts,  e.g,,  subscripts  and 
loop  control  statements,  for  analysis  by  a  more  powerful  symbol 
manipulator  with  re-insertion  in  the  program  by  methods  ]ike  those 
of  Part  3.  This  would  require  extensions  to  the  syntax  macliine  so 
that  its  subprograms  (recognizers)  could  be  written  vri.th  parameters. 

The  speed  of  translation  is  likely  to  be  high;  it  is  estimcited  that  it  would 
take  1000  instructions  in  the  conqjuter  making  the  translation  to  produce  one 
machine  instruction  of  the  translation.  On  the  IBM  704  for  exanqile,  this 
means  that  translation  is  at  the  rate  of  40  instructions  per  second. 

The  major  part  of  the  syntax  machine  has  been  simulated  on  the  IBM  65O. 
This  interpretive  simolation  program  required  60  instructions  and  simulated 
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1 

comparators  for  single  characters  and  the  subroutine  facilities  described  in 
Part  1;  the  output  mechanism  was  also  simulated.  Each  pseudo -instruction  required 
two  cells  of  storage.  Some  coding  experiments  indicate  that  the  assembler  will 
not  more  than  about  400  instructions.  Thus,  it  seems  possible  to  write 
powerful  compiler  in  500  instructions  plus  the  syntax  program. 
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